--- a/CookBook/Parsing.thy Mon Dec 15 10:48:27 2008 +0100
+++ b/CookBook/Parsing.thy Tue Dec 16 08:08:44 2008 +0000
@@ -92,10 +92,17 @@
Note how the result of consumed strings builds up on the left as nested pairs.
- Parsers that explore
- alternatives can be constructed using the function @{ML "(op ||)"}. For example, the
- parser @{ML "(p || q)" for p q} returns the result of @{ML_text "p"}, in case it succeeds,
- otherwise it returns the result of @{ML_text "q"}. For example
+ If, as in the previous example, one wants to parse a particular string,
+ then one should rather use the function @{ML Scan.this_string}:
+
+ @{ML_response [display] "Scan.this_string \"hel\" (explode \"hello\")"
+ "(\"hel\", [\"l\", \"o\"])"}
+
+ Parsers that explore alternatives can be constructed using the function @{ML
+ "(op ||)"}. For example, the parser @{ML "(p || q)" for p q} returns the
+ result of @{ML_text "p"}, in case it succeeds, otherwise it returns the
+ result of @{ML_text "q"}. For example
+
@{ML_response [display]
"let
@@ -240,6 +247,50 @@
However, this kind of manually wrapping needs to be done only very rarely
in practise, because it is already done by the infrastructure for you.
+ The function @{ML Scan.repeat} can be used with @{ML Scan.one} to read any
+ string as in
+
+ @{ML_response [display]
+"let
+ val p = Scan.repeat (Scan.one Symbol.not_eof)
+ val input = (explode \"foo bar foo\")
+in
+ Scan.finite Symbol.stopper p input
+end"
+"([\"f\", \"o\", \"o\", \" \", \"b\", \"a\", \"r\", \" \", \"f\", \"o\", \"o\"], [])"}
+
+ where the function @{ML Symbol.not_eof} ensures that we do not read beyond the
+ end of the input string.
+
+ The function @{ML "Scan.unless p q" for p q} takes two parsers: if the first one can
+ parse the input then the whole parser fails; if not, then the second is tried. Therefore
+
+ @{ML_response_fake_both [display] "Scan.unless ($$ \"h\") ($$ \"w\") (explode \"hello\")"
+ "Exception FAIL raised"}
+
+ fails, while
+
+ @{ML_response [display] "Scan.unless ($$ \"h\") ($$ \"w\") (explode \"world\")"
+ "(\"w\",[\"o\", \"r\", \"l\", \"d\"])"}
+
+ succeeds.
+
+ The functions @{ML Scan.repeat} and @{ML Scan.unless} can be combined to read any
+ input until a certain marker symbol is reached. In the example below the marker
+ symbol is a @{text [quotes] "*"}:
+
+ @{ML_response [display]
+"let
+ val p = Scan.repeat (Scan.unless ($$ \"*\") (Scan.one Symbol.not_eof))
+ val input1 = (explode \"fooooo\")
+ val input2 = (explode \"foo*ooo\")
+in
+ (Scan.finite Symbol.stopper p input1,
+ Scan.finite Symbol.stopper p input2)
+end"
+"(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []),
+ ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"}
+
After parsing succeeded, one nearly always wants to apply a function on the parsed
items. This is done using the function @{ML "(p >> f)" for p f} which runs
first the parser @{ML_text p} and upon successful completion applies the
@@ -255,6 +306,14 @@
doubles the two parsed input strings.
+ \begin{exercise}\label{ex:scancmts}
+ Write a parser parses an input string so that any comment enclosed inside
+ @{text "(*\<dots>*)"} is enclosed inside @{text "(**\<dots>**)"} in the output
+ string. To enclose a string, you can use the function @{ML "enclose s1 s2 s"
+ for s1 s2 s} which produces the string @{ML "s1^s^s2" for s1 s2 s}.
+ \end{exercise}
+
+
The function @{ML Scan.lift} takes a parser and a pair as arguments. This function applies
the given parser to the second component of the pair and leaves the first component
untouched. For example