diff -r 0b55402ae95e -r 126646f2aa88 CookBook/Parsing.thy --- a/CookBook/Parsing.thy Mon Dec 15 10:48:27 2008 +0100 +++ b/CookBook/Parsing.thy Tue Dec 16 08:08:44 2008 +0000 @@ -92,10 +92,17 @@ Note how the result of consumed strings builds up on the left as nested pairs. - Parsers that explore - alternatives can be constructed using the function @{ML "(op ||)"}. For example, the - parser @{ML "(p || q)" for p q} returns the result of @{ML_text "p"}, in case it succeeds, - otherwise it returns the result of @{ML_text "q"}. For example + If, as in the previous example, one wants to parse a particular string, + then one should rather use the function @{ML Scan.this_string}: + + @{ML_response [display] "Scan.this_string \"hel\" (explode \"hello\")" + "(\"hel\", [\"l\", \"o\"])"} + + Parsers that explore alternatives can be constructed using the function @{ML + "(op ||)"}. For example, the parser @{ML "(p || q)" for p q} returns the + result of @{ML_text "p"}, in case it succeeds, otherwise it returns the + result of @{ML_text "q"}. For example + @{ML_response [display] "let @@ -240,6 +247,50 @@ However, this kind of manually wrapping needs to be done only very rarely in practise, because it is already done by the infrastructure for you. + The function @{ML Scan.repeat} can be used with @{ML Scan.one} to read any + string as in + + @{ML_response [display] +"let + val p = Scan.repeat (Scan.one Symbol.not_eof) + val input = (explode \"foo bar foo\") +in + Scan.finite Symbol.stopper p input +end" +"([\"f\", \"o\", \"o\", \" \", \"b\", \"a\", \"r\", \" \", \"f\", \"o\", \"o\"], [])"} + + where the function @{ML Symbol.not_eof} ensures that we do not read beyond the + end of the input string. + + The function @{ML "Scan.unless p q" for p q} takes two parsers: if the first one can + parse the input then the whole parser fails; if not, then the second is tried. Therefore + + @{ML_response_fake_both [display] "Scan.unless ($$ \"h\") ($$ \"w\") (explode \"hello\")" + "Exception FAIL raised"} + + fails, while + + @{ML_response [display] "Scan.unless ($$ \"h\") ($$ \"w\") (explode \"world\")" + "(\"w\",[\"o\", \"r\", \"l\", \"d\"])"} + + succeeds. + + The functions @{ML Scan.repeat} and @{ML Scan.unless} can be combined to read any + input until a certain marker symbol is reached. In the example below the marker + symbol is a @{text [quotes] "*"}: + + @{ML_response [display] +"let + val p = Scan.repeat (Scan.unless ($$ \"*\") (Scan.one Symbol.not_eof)) + val input1 = (explode \"fooooo\") + val input2 = (explode \"foo*ooo\") +in + (Scan.finite Symbol.stopper p input1, + Scan.finite Symbol.stopper p input2) +end" +"(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []), + ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"} + After parsing succeeded, one nearly always wants to apply a function on the parsed items. This is done using the function @{ML "(p >> f)" for p f} which runs first the parser @{ML_text p} and upon successful completion applies the @@ -255,6 +306,14 @@ doubles the two parsed input strings. + \begin{exercise}\label{ex:scancmts} + Write a parser parses an input string so that any comment enclosed inside + @{text "(*\*)"} is enclosed inside @{text "(**\**)"} in the output + string. To enclose a string, you can use the function @{ML "enclose s1 s2 s" + for s1 s2 s} which produces the string @{ML "s1^s^s2" for s1 s2 s}. + \end{exercise} + + The function @{ML Scan.lift} takes a parser and a pair as arguments. This function applies the given parser to the second component of the pair and leaves the first component untouched. For example