diff -r 065f472c09ab -r f3794c231898 CookBook/Parsing.thy --- a/CookBook/Parsing.thy Tue Dec 16 17:37:39 2008 +0100 +++ b/CookBook/Parsing.thy Tue Dec 16 17:28:05 2008 +0000 @@ -46,20 +46,20 @@ @{ML_response [display] "($$ \"w\") (explode \"world\")" "(\"w\", [\"o\", \"r\", \"l\", \"d\"])"} This function will either succeed (as in the two examples above) or raise the exception - @{ML_text "FAIL"} if no string can be consumed. For example trying to parse + @{text "FAIL"} if no string can be consumed. For example trying to parse @{ML_response_fake [display] "($$ \"x\") (explode \"world\")" "Exception FAIL raised"} - will raise the exception @{ML_text "FAIL"}. + will raise the exception @{text "FAIL"}. There are three exceptions used in the parsing combinators: \begin{itemize} - \item @{ML_text "FAIL"} is used to indicate that alternative routes of parsing + \item @{text "FAIL"} is used to indicate that alternative routes of parsing might be explored. - \item @{ML_text "MORE"} indicates that there is not enough input for the parser. For example - in @{ML_text "($$ \"h\") []"}. - \item @{ML_text "ABORT"} is the exception which is raised when a dead end is reached. + \item @{text "MORE"} indicates that there is not enough input for the parser. For example + in @{text "($$ \"h\") []"}. + \item @{text "ABORT"} is the exception which is raised when a dead end is reached. It is used for example in the function @{ML "(op !!)"} (see below). \end{itemize} @@ -69,7 +69,7 @@ Slightly more general than the parser @{ML "(op $$)"} is the function @{ML Scan.one}, in that it takes a predicate as argument and then parses exactly one item from the input list satisfying this predicate. For example the - following parser either consumes an @{ML_text [quotes] "h"} or a @{ML_text + following parser either consumes an @{text [quotes] "h"} or a @{text [quotes] "w"}: @@ -84,7 +84,7 @@ "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} Two parser can be connected in sequence by using the function @{ML "(op --)"}. - For example parsing @{ML_text "h"}, @{ML_text "e"} and @{ML_text "l"} in this + For example parsing @{text "h"}, @{text "e"} and @{text "l"} in this sequence can be achieved by @{ML_response [display] "(($$ \"h\") -- ($$ \"e\") -- ($$ \"l\")) (explode \"hello\")" @@ -100,8 +100,8 @@ Parsers that explore alternatives can be constructed using the function @{ML "(op ||)"}. For example, the parser @{ML "(p || q)" for p q} returns the - result of @{ML_text "p"}, in case it succeeds, otherwise it returns the - result of @{ML_text "q"}. For example + result of @{text "p"}, in case it succeeds, otherwise it returns the + result of @{text "q"}. For example @{ML_response [display] @@ -129,8 +129,8 @@ "((\"e\", [\"l\", \"l\", \"o\"]),(\"h\", [\"l\", \"l\", \"o\"]))"} The parser @{ML "Scan.optional p x" for p x} returns the result of the parser - @{ML_text "p"}, if it succeeds; otherwise it returns - the default value @{ML_text "x"}. For example + @{text "p"}, if it succeeds; otherwise it returns + the default value @{text "x"}. For example @{ML_response [display] "let @@ -155,8 +155,8 @@ end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} The function @{ML "(op !!)"} helps to produce appropriate error messages - during parsing. For example if one wants to parse that @{ML_text p} is immediately - followed by @{ML_text q}, or start a completely different parser @{ML_text r}, + during parsing. For example if one wants to parse that @{text p} is immediately + followed by @{text q}, or start a completely different parser @{text r}, one might write @{ML [display] "(p -- q) || r" for p q r} @@ -164,31 +164,31 @@ However, this parser is problematic for producing an appropriate error message, in case the parsing of @{ML "(p -- q)" for p q} fails. Because in that case one loses with the parser above the information - that @{ML_text p} should be followed by @{ML_text q}. To see this consider the case in - which @{ML_text p} - is present in the input, but not @{ML_text q}. That means @{ML "(p -- q)" for p q} will fail + that @{text p} should be followed by @{text q}. To see this consider the case in + which @{text p} + is present in the input, but not @{text q}. That means @{ML "(p -- q)" for p q} will fail and the - alternative parser @{ML_text r} will be tried. However in many circumstance this will be the wrong + alternative parser @{text r} will be tried. However in many circumstance this will be the wrong parser for the input ``p-followed-by-q'' and therefore will also fail. The error message is then caused by the - failure of @{ML_text r}, not by the absence of @{ML_text q} in the input. This kind of situation + failure of @{text r}, not by the absence of @{text q} in the input. This kind of situation can be avoided by using the function @{ML "(op !!)"}. This function aborts the whole process of parsing in case of a failure and invokes an error message. For example if we invoke the parser @{ML [display] "(!! (fn _ => \"foo\") ($$ \"h\"))"} - on @{ML_text [quotes] "hello"}, the parsing succeeds + on @{text [quotes] "hello"}, the parsing succeeds @{ML_response [display] "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"hello\")" "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"} - but if we invoke it on @{ML_text [quotes] "world"} + but if we invoke it on @{text [quotes] "world"} @{ML_response_fake [display] "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"world\")" "Exception ABORT raised"} - the parsing aborts and the error message @{ML_text "foo"} is printed out. In order to + the parsing aborts and the error message @{text "foo"} is printed out. In order to see the error message properly, we need to prefix the parser with the function @{ML "Scan.error"}. For example @@ -226,18 +226,18 @@ yields the expected parsing. - The function @{ML "Scan.repeat p" for p} will apply a parser @{ML_text p} as + The function @{ML "Scan.repeat p" for p} will apply a parser @{text p} as often as it succeeds. For example @{ML_response [display] "Scan.repeat ($$ \"h\") (explode \"hhhhello\")" "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"} Note that @{ML "Scan.repeat"} stores the parsed items in a list. The function - @{ML "Scan.repeat1"} is similar, but requires that the parser @{ML_text "p"} + @{ML "Scan.repeat1"} is similar, but requires that the parser @{text "p"} succeeds at least once. - Also note that the parser would have aborted with the exception @{ML_text MORE}, if - we had run it only on just @{ML_text [quotes] "hhhh"}. This can be avoided using + Also note that the parser would have aborted with the exception @{text MORE}, if + we had run it only on just @{text [quotes] "hhhh"}. This can be avoided using the wrapper @{ML Scan.finite} and the ``stopper-token'' @{ML Symbol.stopper}. With them we can write @@ -277,7 +277,7 @@ The functions @{ML Scan.repeat} and @{ML Scan.unless} can be combined to read any input until a certain marker symbol is reached. In the example below the marker - symbol is a @{text [quotes] "*"}: + symbol is a @{text [quotes] "*"} which causes the parser to stop: @{ML_response [display] "let @@ -293,8 +293,8 @@ After parsing succeeded, one nearly always wants to apply a function on the parsed items. This is done using the function @{ML "(p >> f)" for p f} which runs - first the parser @{ML_text p} and upon successful completion applies the - function @{ML_text f} to the result. For example + first the parser @{text p} and upon successful completion applies the + function @{text f} to the result. For example @{ML_response [display] "let @@ -307,10 +307,11 @@ doubles the two parsed input strings. \begin{exercise}\label{ex:scancmts} - Write a parser parses an input string so that any comment enclosed inside - @{text "(*\*)"} is enclosed inside @{text "(**\**)"} in the output - string. To enclose a string, you can use the function @{ML "enclose s1 s2 s" - for s1 s2 s} which produces the string @{ML "s1^s^s2" for s1 s2 s}. + Write a parser that parses an input string so that any comment enclosed + inside @{text "(*\*)"} is replaced by a the same comment but enclosed inside + @{text "(**\**)"} in the output string. To enclose a string, you can use the + function @{ML "enclose s1 s2 s" for s1 s2 s} which produces the string @{ML + "s1^s^s2" for s1 s2 s}. \end{exercise} @@ -358,7 +359,7 @@ Token (\,(Ident, \"world\"),\)]"} produces three tokens where the first and the last are identifiers, since - @{ML_text [quotes] "hello"} and @{ML_text [quotes] "world"} do not match any + @{text [quotes] "hello"} and @{text [quotes] "world"} do not match any other syntactic category.\footnote{Note that because of a possible a bug in the PolyML runtime system the result is printed as @{text "?"}, instead of the token.} The second indicates a space. @@ -430,8 +431,8 @@ "((\"|\",\"in\"),[])"} The parser @{ML "OuterParse.enum s p" for s p} parses a possibly empty - list of items recognised by the parser @{ML_text p}, where the items being parsed - are separated by the string @{ML_text s}. For example + list of items recognised by the parser @{text p}, where the items being parsed + are separated by the string @{text s}. For example @{ML_response [display] "let @@ -442,9 +443,9 @@ "([\"in\",\"in\",\"in\"],[\])"} @{ML "OuterParse.enum1"} works similarly, except that the parsed list must - be non-empty. Note that we had to add an @{ML_text [quotes] "foo"} at the end + be non-empty. Note that we had to add an @{text [quotes] "foo"} at the end of the parsed string, otherwise the parser would have consumed all tokens - and then failed with the exception @{ML_text "MORE"}. Like in the previous + and then failed with the exception @{text "MORE"}. Like in the previous section, we can avoid this exception using the wrapper @{ML Scan.finite}. This time, however, we have to use the ``stopper-token'' @{ML OuterLex.stopper}. We can write