--- a/CookBook/Parsing.thy Tue Dec 16 17:37:39 2008 +0100
+++ b/CookBook/Parsing.thy Tue Dec 16 17:28:05 2008 +0000
@@ -46,20 +46,20 @@
@{ML_response [display] "($$ \"w\") (explode \"world\")" "(\"w\", [\"o\", \"r\", \"l\", \"d\"])"}
This function will either succeed (as in the two examples above) or raise the exception
- @{ML_text "FAIL"} if no string can be consumed. For example trying to parse
+ @{text "FAIL"} if no string can be consumed. For example trying to parse
@{ML_response_fake [display] "($$ \"x\") (explode \"world\")"
"Exception FAIL raised"}
- will raise the exception @{ML_text "FAIL"}.
+ will raise the exception @{text "FAIL"}.
There are three exceptions used in the parsing combinators:
\begin{itemize}
- \item @{ML_text "FAIL"} is used to indicate that alternative routes of parsing
+ \item @{text "FAIL"} is used to indicate that alternative routes of parsing
might be explored.
- \item @{ML_text "MORE"} indicates that there is not enough input for the parser. For example
- in @{ML_text "($$ \"h\") []"}.
- \item @{ML_text "ABORT"} is the exception which is raised when a dead end is reached.
+ \item @{text "MORE"} indicates that there is not enough input for the parser. For example
+ in @{text "($$ \"h\") []"}.
+ \item @{text "ABORT"} is the exception which is raised when a dead end is reached.
It is used for example in the function @{ML "(op !!)"} (see below).
\end{itemize}
@@ -69,7 +69,7 @@
Slightly more general than the parser @{ML "(op $$)"} is the function @{ML
Scan.one}, in that it takes a predicate as argument and then parses exactly
one item from the input list satisfying this predicate. For example the
- following parser either consumes an @{ML_text [quotes] "h"} or a @{ML_text
+ following parser either consumes an @{text [quotes] "h"} or a @{text
[quotes] "w"}:
@@ -84,7 +84,7 @@
"((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
Two parser can be connected in sequence by using the function @{ML "(op --)"}.
- For example parsing @{ML_text "h"}, @{ML_text "e"} and @{ML_text "l"} in this
+ For example parsing @{text "h"}, @{text "e"} and @{text "l"} in this
sequence can be achieved by
@{ML_response [display] "(($$ \"h\") -- ($$ \"e\") -- ($$ \"l\")) (explode \"hello\")"
@@ -100,8 +100,8 @@
Parsers that explore alternatives can be constructed using the function @{ML
"(op ||)"}. For example, the parser @{ML "(p || q)" for p q} returns the
- result of @{ML_text "p"}, in case it succeeds, otherwise it returns the
- result of @{ML_text "q"}. For example
+ result of @{text "p"}, in case it succeeds, otherwise it returns the
+ result of @{text "q"}. For example
@{ML_response [display]
@@ -129,8 +129,8 @@
"((\"e\", [\"l\", \"l\", \"o\"]),(\"h\", [\"l\", \"l\", \"o\"]))"}
The parser @{ML "Scan.optional p x" for p x} returns the result of the parser
- @{ML_text "p"}, if it succeeds; otherwise it returns
- the default value @{ML_text "x"}. For example
+ @{text "p"}, if it succeeds; otherwise it returns
+ the default value @{text "x"}. For example
@{ML_response [display]
"let
@@ -155,8 +155,8 @@
end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}
The function @{ML "(op !!)"} helps to produce appropriate error messages
- during parsing. For example if one wants to parse that @{ML_text p} is immediately
- followed by @{ML_text q}, or start a completely different parser @{ML_text r},
+ during parsing. For example if one wants to parse that @{text p} is immediately
+ followed by @{text q}, or start a completely different parser @{text r},
one might write
@{ML [display] "(p -- q) || r" for p q r}
@@ -164,31 +164,31 @@
However, this parser is problematic for producing an appropriate error message, in case
the parsing of @{ML "(p -- q)" for p q} fails. Because in that case one loses with the parser
above the information
- that @{ML_text p} should be followed by @{ML_text q}. To see this consider the case in
- which @{ML_text p}
- is present in the input, but not @{ML_text q}. That means @{ML "(p -- q)" for p q} will fail
+ that @{text p} should be followed by @{text q}. To see this consider the case in
+ which @{text p}
+ is present in the input, but not @{text q}. That means @{ML "(p -- q)" for p q} will fail
and the
- alternative parser @{ML_text r} will be tried. However in many circumstance this will be the wrong
+ alternative parser @{text r} will be tried. However in many circumstance this will be the wrong
parser for the input ``p-followed-by-q'' and therefore will also fail. The error message is then
caused by the
- failure of @{ML_text r}, not by the absence of @{ML_text q} in the input. This kind of situation
+ failure of @{text r}, not by the absence of @{text q} in the input. This kind of situation
can be avoided by using the function @{ML "(op !!)"}. This function aborts the whole process of
parsing in case of a failure and invokes an error message. For example if we invoke the parser
@{ML [display] "(!! (fn _ => \"foo\") ($$ \"h\"))"}
- on @{ML_text [quotes] "hello"}, the parsing succeeds
+ on @{text [quotes] "hello"}, the parsing succeeds
@{ML_response [display]
"(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"hello\")"
"(\"h\", [\"e\", \"l\", \"l\", \"o\"])"}
- but if we invoke it on @{ML_text [quotes] "world"}
+ but if we invoke it on @{text [quotes] "world"}
@{ML_response_fake [display] "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"world\")"
"Exception ABORT raised"}
- the parsing aborts and the error message @{ML_text "foo"} is printed out. In order to
+ the parsing aborts and the error message @{text "foo"} is printed out. In order to
see the error message properly, we need to prefix the parser with the function
@{ML "Scan.error"}. For example
@@ -226,18 +226,18 @@
yields the expected parsing.
- The function @{ML "Scan.repeat p" for p} will apply a parser @{ML_text p} as
+ The function @{ML "Scan.repeat p" for p} will apply a parser @{text p} as
often as it succeeds. For example
@{ML_response [display] "Scan.repeat ($$ \"h\") (explode \"hhhhello\")"
"([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"}
Note that @{ML "Scan.repeat"} stores the parsed items in a list. The function
- @{ML "Scan.repeat1"} is similar, but requires that the parser @{ML_text "p"}
+ @{ML "Scan.repeat1"} is similar, but requires that the parser @{text "p"}
succeeds at least once.
- Also note that the parser would have aborted with the exception @{ML_text MORE}, if
- we had run it only on just @{ML_text [quotes] "hhhh"}. This can be avoided using
+ Also note that the parser would have aborted with the exception @{text MORE}, if
+ we had run it only on just @{text [quotes] "hhhh"}. This can be avoided using
the wrapper @{ML Scan.finite} and the ``stopper-token'' @{ML Symbol.stopper}. With
them we can write
@@ -277,7 +277,7 @@
The functions @{ML Scan.repeat} and @{ML Scan.unless} can be combined to read any
input until a certain marker symbol is reached. In the example below the marker
- symbol is a @{text [quotes] "*"}:
+ symbol is a @{text [quotes] "*"} which causes the parser to stop:
@{ML_response [display]
"let
@@ -293,8 +293,8 @@
After parsing succeeded, one nearly always wants to apply a function on the parsed
items. This is done using the function @{ML "(p >> f)" for p f} which runs
- first the parser @{ML_text p} and upon successful completion applies the
- function @{ML_text f} to the result. For example
+ first the parser @{text p} and upon successful completion applies the
+ function @{text f} to the result. For example
@{ML_response [display]
"let
@@ -307,10 +307,11 @@
doubles the two parsed input strings.
\begin{exercise}\label{ex:scancmts}
- Write a parser parses an input string so that any comment enclosed inside
- @{text "(*\<dots>*)"} is enclosed inside @{text "(**\<dots>**)"} in the output
- string. To enclose a string, you can use the function @{ML "enclose s1 s2 s"
- for s1 s2 s} which produces the string @{ML "s1^s^s2" for s1 s2 s}.
+ Write a parser that parses an input string so that any comment enclosed
+ inside @{text "(*\<dots>*)"} is replaced by a the same comment but enclosed inside
+ @{text "(**\<dots>**)"} in the output string. To enclose a string, you can use the
+ function @{ML "enclose s1 s2 s" for s1 s2 s} which produces the string @{ML
+ "s1^s^s2" for s1 s2 s}.
\end{exercise}
@@ -358,7 +359,7 @@
Token (\<dots>,(Ident, \"world\"),\<dots>)]"}
produces three tokens where the first and the last are identifiers, since
- @{ML_text [quotes] "hello"} and @{ML_text [quotes] "world"} do not match any
+ @{text [quotes] "hello"} and @{text [quotes] "world"} do not match any
other syntactic category.\footnote{Note that because of a possible a bug in
the PolyML runtime system the result is printed as @{text "?"}, instead of
the token.} The second indicates a space.
@@ -430,8 +431,8 @@
"((\"|\",\"in\"),[])"}
The parser @{ML "OuterParse.enum s p" for s p} parses a possibly empty
- list of items recognised by the parser @{ML_text p}, where the items being parsed
- are separated by the string @{ML_text s}. For example
+ list of items recognised by the parser @{text p}, where the items being parsed
+ are separated by the string @{text s}. For example
@{ML_response [display]
"let
@@ -442,9 +443,9 @@
"([\"in\",\"in\",\"in\"],[\<dots>])"}
@{ML "OuterParse.enum1"} works similarly, except that the parsed list must
- be non-empty. Note that we had to add an @{ML_text [quotes] "foo"} at the end
+ be non-empty. Note that we had to add an @{text [quotes] "foo"} at the end
of the parsed string, otherwise the parser would have consumed all tokens
- and then failed with the exception @{ML_text "MORE"}. Like in the previous
+ and then failed with the exception @{text "MORE"}. Like in the previous
section, we can avoid this exception using the wrapper @{ML
Scan.finite}. This time, however, we have to use the ``stopper-token'' @{ML
OuterLex.stopper}. We can write