44 |
44 |
45 @{ML_response [display] "($$ \"h\") (explode \"hello\")" "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"} |
45 @{ML_response [display] "($$ \"h\") (explode \"hello\")" "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"} |
46 @{ML_response [display] "($$ \"w\") (explode \"world\")" "(\"w\", [\"o\", \"r\", \"l\", \"d\"])"} |
46 @{ML_response [display] "($$ \"w\") (explode \"world\")" "(\"w\", [\"o\", \"r\", \"l\", \"d\"])"} |
47 |
47 |
48 This function will either succeed (as in the two examples above) or raise the exception |
48 This function will either succeed (as in the two examples above) or raise the exception |
49 @{ML_text "FAIL"} if no string can be consumed. For example trying to parse |
49 @{text "FAIL"} if no string can be consumed. For example trying to parse |
50 |
50 |
51 @{ML_response_fake [display] "($$ \"x\") (explode \"world\")" |
51 @{ML_response_fake [display] "($$ \"x\") (explode \"world\")" |
52 "Exception FAIL raised"} |
52 "Exception FAIL raised"} |
53 |
53 |
54 will raise the exception @{ML_text "FAIL"}. |
54 will raise the exception @{text "FAIL"}. |
55 There are three exceptions used in the parsing combinators: |
55 There are three exceptions used in the parsing combinators: |
56 |
56 |
57 \begin{itemize} |
57 \begin{itemize} |
58 \item @{ML_text "FAIL"} is used to indicate that alternative routes of parsing |
58 \item @{text "FAIL"} is used to indicate that alternative routes of parsing |
59 might be explored. |
59 might be explored. |
60 \item @{ML_text "MORE"} indicates that there is not enough input for the parser. For example |
60 \item @{text "MORE"} indicates that there is not enough input for the parser. For example |
61 in @{ML_text "($$ \"h\") []"}. |
61 in @{text "($$ \"h\") []"}. |
62 \item @{ML_text "ABORT"} is the exception which is raised when a dead end is reached. |
62 \item @{text "ABORT"} is the exception which is raised when a dead end is reached. |
63 It is used for example in the function @{ML "(op !!)"} (see below). |
63 It is used for example in the function @{ML "(op !!)"} (see below). |
64 \end{itemize} |
64 \end{itemize} |
65 |
65 |
66 However, note that these exceptions are private to the parser and cannot be accessed |
66 However, note that these exceptions are private to the parser and cannot be accessed |
67 by the programmer (for example to handle them). |
67 by the programmer (for example to handle them). |
68 |
68 |
69 Slightly more general than the parser @{ML "(op $$)"} is the function @{ML |
69 Slightly more general than the parser @{ML "(op $$)"} is the function @{ML |
70 Scan.one}, in that it takes a predicate as argument and then parses exactly |
70 Scan.one}, in that it takes a predicate as argument and then parses exactly |
71 one item from the input list satisfying this predicate. For example the |
71 one item from the input list satisfying this predicate. For example the |
72 following parser either consumes an @{ML_text [quotes] "h"} or a @{ML_text |
72 following parser either consumes an @{text [quotes] "h"} or a @{text |
73 [quotes] "w"}: |
73 [quotes] "w"}: |
74 |
74 |
75 |
75 |
76 @{ML_response [display] |
76 @{ML_response [display] |
77 "let |
77 "let |
82 (hw input1, hw input2) |
82 (hw input1, hw input2) |
83 end" |
83 end" |
84 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} |
84 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} |
85 |
85 |
86 Two parser can be connected in sequence by using the function @{ML "(op --)"}. |
86 Two parser can be connected in sequence by using the function @{ML "(op --)"}. |
87 For example parsing @{ML_text "h"}, @{ML_text "e"} and @{ML_text "l"} in this |
87 For example parsing @{text "h"}, @{text "e"} and @{text "l"} in this |
88 sequence can be achieved by |
88 sequence can be achieved by |
89 |
89 |
90 @{ML_response [display] "(($$ \"h\") -- ($$ \"e\") -- ($$ \"l\")) (explode \"hello\")" |
90 @{ML_response [display] "(($$ \"h\") -- ($$ \"e\") -- ($$ \"l\")) (explode \"hello\")" |
91 "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"} |
91 "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"} |
92 |
92 |
98 @{ML_response [display] "Scan.this_string \"hel\" (explode \"hello\")" |
98 @{ML_response [display] "Scan.this_string \"hel\" (explode \"hello\")" |
99 "(\"hel\", [\"l\", \"o\"])"} |
99 "(\"hel\", [\"l\", \"o\"])"} |
100 |
100 |
101 Parsers that explore alternatives can be constructed using the function @{ML |
101 Parsers that explore alternatives can be constructed using the function @{ML |
102 "(op ||)"}. For example, the parser @{ML "(p || q)" for p q} returns the |
102 "(op ||)"}. For example, the parser @{ML "(p || q)" for p q} returns the |
103 result of @{ML_text "p"}, in case it succeeds, otherwise it returns the |
103 result of @{text "p"}, in case it succeeds, otherwise it returns the |
104 result of @{ML_text "q"}. For example |
104 result of @{text "q"}. For example |
105 |
105 |
106 |
106 |
107 @{ML_response [display] |
107 @{ML_response [display] |
108 "let |
108 "let |
109 val hw = ($$ \"h\") || ($$ \"w\") |
109 val hw = ($$ \"h\") || ($$ \"w\") |
127 (just_e input, just_h input) |
127 (just_e input, just_h input) |
128 end" |
128 end" |
129 "((\"e\", [\"l\", \"l\", \"o\"]),(\"h\", [\"l\", \"l\", \"o\"]))"} |
129 "((\"e\", [\"l\", \"l\", \"o\"]),(\"h\", [\"l\", \"l\", \"o\"]))"} |
130 |
130 |
131 The parser @{ML "Scan.optional p x" for p x} returns the result of the parser |
131 The parser @{ML "Scan.optional p x" for p x} returns the result of the parser |
132 @{ML_text "p"}, if it succeeds; otherwise it returns |
132 @{text "p"}, if it succeeds; otherwise it returns |
133 the default value @{ML_text "x"}. For example |
133 the default value @{text "x"}. For example |
134 |
134 |
135 @{ML_response [display] |
135 @{ML_response [display] |
136 "let |
136 "let |
137 val p = Scan.optional ($$ \"h\") \"x\" |
137 val p = Scan.optional ($$ \"h\") \"x\" |
138 val input1 = (explode \"hello\") |
138 val input1 = (explode \"hello\") |
153 in |
153 in |
154 (p input1, p input2) |
154 (p input1, p input2) |
155 end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} |
155 end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} |
156 |
156 |
157 The function @{ML "(op !!)"} helps to produce appropriate error messages |
157 The function @{ML "(op !!)"} helps to produce appropriate error messages |
158 during parsing. For example if one wants to parse that @{ML_text p} is immediately |
158 during parsing. For example if one wants to parse that @{text p} is immediately |
159 followed by @{ML_text q}, or start a completely different parser @{ML_text r}, |
159 followed by @{text q}, or start a completely different parser @{text r}, |
160 one might write |
160 one might write |
161 |
161 |
162 @{ML [display] "(p -- q) || r" for p q r} |
162 @{ML [display] "(p -- q) || r" for p q r} |
163 |
163 |
164 However, this parser is problematic for producing an appropriate error message, in case |
164 However, this parser is problematic for producing an appropriate error message, in case |
165 the parsing of @{ML "(p -- q)" for p q} fails. Because in that case one loses with the parser |
165 the parsing of @{ML "(p -- q)" for p q} fails. Because in that case one loses with the parser |
166 above the information |
166 above the information |
167 that @{ML_text p} should be followed by @{ML_text q}. To see this consider the case in |
167 that @{text p} should be followed by @{text q}. To see this consider the case in |
168 which @{ML_text p} |
168 which @{text p} |
169 is present in the input, but not @{ML_text q}. That means @{ML "(p -- q)" for p q} will fail |
169 is present in the input, but not @{text q}. That means @{ML "(p -- q)" for p q} will fail |
170 and the |
170 and the |
171 alternative parser @{ML_text r} will be tried. However in many circumstance this will be the wrong |
171 alternative parser @{text r} will be tried. However in many circumstance this will be the wrong |
172 parser for the input ``p-followed-by-q'' and therefore will also fail. The error message is then |
172 parser for the input ``p-followed-by-q'' and therefore will also fail. The error message is then |
173 caused by the |
173 caused by the |
174 failure of @{ML_text r}, not by the absence of @{ML_text q} in the input. This kind of situation |
174 failure of @{text r}, not by the absence of @{text q} in the input. This kind of situation |
175 can be avoided by using the function @{ML "(op !!)"}. This function aborts the whole process of |
175 can be avoided by using the function @{ML "(op !!)"}. This function aborts the whole process of |
176 parsing in case of a failure and invokes an error message. For example if we invoke the parser |
176 parsing in case of a failure and invokes an error message. For example if we invoke the parser |
177 |
177 |
178 @{ML [display] "(!! (fn _ => \"foo\") ($$ \"h\"))"} |
178 @{ML [display] "(!! (fn _ => \"foo\") ($$ \"h\"))"} |
179 |
179 |
180 on @{ML_text [quotes] "hello"}, the parsing succeeds |
180 on @{text [quotes] "hello"}, the parsing succeeds |
181 |
181 |
182 @{ML_response [display] |
182 @{ML_response [display] |
183 "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"hello\")" |
183 "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"hello\")" |
184 "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"} |
184 "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"} |
185 |
185 |
186 but if we invoke it on @{ML_text [quotes] "world"} |
186 but if we invoke it on @{text [quotes] "world"} |
187 |
187 |
188 @{ML_response_fake [display] "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"world\")" |
188 @{ML_response_fake [display] "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"world\")" |
189 "Exception ABORT raised"} |
189 "Exception ABORT raised"} |
190 |
190 |
191 the parsing aborts and the error message @{ML_text "foo"} is printed out. In order to |
191 the parsing aborts and the error message @{text "foo"} is printed out. In order to |
192 see the error message properly, we need to prefix the parser with the function |
192 see the error message properly, we need to prefix the parser with the function |
193 @{ML "Scan.error"}. For example |
193 @{ML "Scan.error"}. For example |
194 |
194 |
195 @{ML_response_fake [display] "Scan.error ((!! (fn _ => \"foo\") ($$ \"h\")))" |
195 @{ML_response_fake [display] "Scan.error ((!! (fn _ => \"foo\") ($$ \"h\")))" |
196 "Exception Error \"foo\" raised"} |
196 "Exception Error \"foo\" raised"} |
224 @{ML_response [display] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"wworld\")" |
224 @{ML_response [display] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"wworld\")" |
225 "((\"w\", \"w\"), [\"o\", \"r\", \"l\", \"d\"])"} |
225 "((\"w\", \"w\"), [\"o\", \"r\", \"l\", \"d\"])"} |
226 |
226 |
227 yields the expected parsing. |
227 yields the expected parsing. |
228 |
228 |
229 The function @{ML "Scan.repeat p" for p} will apply a parser @{ML_text p} as |
229 The function @{ML "Scan.repeat p" for p} will apply a parser @{text p} as |
230 often as it succeeds. For example |
230 often as it succeeds. For example |
231 |
231 |
232 @{ML_response [display] "Scan.repeat ($$ \"h\") (explode \"hhhhello\")" |
232 @{ML_response [display] "Scan.repeat ($$ \"h\") (explode \"hhhhello\")" |
233 "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"} |
233 "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"} |
234 |
234 |
235 Note that @{ML "Scan.repeat"} stores the parsed items in a list. The function |
235 Note that @{ML "Scan.repeat"} stores the parsed items in a list. The function |
236 @{ML "Scan.repeat1"} is similar, but requires that the parser @{ML_text "p"} |
236 @{ML "Scan.repeat1"} is similar, but requires that the parser @{text "p"} |
237 succeeds at least once. |
237 succeeds at least once. |
238 |
238 |
239 Also note that the parser would have aborted with the exception @{ML_text MORE}, if |
239 Also note that the parser would have aborted with the exception @{text MORE}, if |
240 we had run it only on just @{ML_text [quotes] "hhhh"}. This can be avoided using |
240 we had run it only on just @{text [quotes] "hhhh"}. This can be avoided using |
241 the wrapper @{ML Scan.finite} and the ``stopper-token'' @{ML Symbol.stopper}. With |
241 the wrapper @{ML Scan.finite} and the ``stopper-token'' @{ML Symbol.stopper}. With |
242 them we can write |
242 them we can write |
243 |
243 |
244 @{ML_response [display] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (explode \"hhhh\")" |
244 @{ML_response [display] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (explode \"hhhh\")" |
245 "([\"h\", \"h\", \"h\", \"h\"], [])"} |
245 "([\"h\", \"h\", \"h\", \"h\"], [])"} |
291 "(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []), |
291 "(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []), |
292 ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"} |
292 ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"} |
293 |
293 |
294 After parsing succeeded, one nearly always wants to apply a function on the parsed |
294 After parsing succeeded, one nearly always wants to apply a function on the parsed |
295 items. This is done using the function @{ML "(p >> f)" for p f} which runs |
295 items. This is done using the function @{ML "(p >> f)" for p f} which runs |
296 first the parser @{ML_text p} and upon successful completion applies the |
296 first the parser @{text p} and upon successful completion applies the |
297 function @{ML_text f} to the result. For example |
297 function @{text f} to the result. For example |
298 |
298 |
299 @{ML_response [display] |
299 @{ML_response [display] |
300 "let |
300 "let |
301 fun double (x,y) = (x^x,y^y) |
301 fun double (x,y) = (x^x,y^y) |
302 in |
302 in |
305 "((\"hh\", \"ee\"), [\"l\", \"l\", \"o\"])"} |
305 "((\"hh\", \"ee\"), [\"l\", \"l\", \"o\"])"} |
306 |
306 |
307 doubles the two parsed input strings. |
307 doubles the two parsed input strings. |
308 |
308 |
309 \begin{exercise}\label{ex:scancmts} |
309 \begin{exercise}\label{ex:scancmts} |
310 Write a parser parses an input string so that any comment enclosed inside |
310 Write a parser that parses an input string so that any comment enclosed |
311 @{text "(*\<dots>*)"} is enclosed inside @{text "(**\<dots>**)"} in the output |
311 inside @{text "(*\<dots>*)"} is replaced by a the same comment but enclosed inside |
312 string. To enclose a string, you can use the function @{ML "enclose s1 s2 s" |
312 @{text "(**\<dots>**)"} in the output string. To enclose a string, you can use the |
313 for s1 s2 s} which produces the string @{ML "s1^s^s2" for s1 s2 s}. |
313 function @{ML "enclose s1 s2 s" for s1 s2 s} which produces the string @{ML |
|
314 "s1^s^s2" for s1 s2 s}. |
314 \end{exercise} |
315 \end{exercise} |
315 |
316 |
316 |
317 |
317 The function @{ML Scan.lift} takes a parser and a pair as arguments. This function applies |
318 The function @{ML Scan.lift} takes a parser and a pair as arguments. This function applies |
318 the given parser to the second component of the pair and leaves the first component |
319 the given parser to the second component of the pair and leaves the first component |
356 "[Token (\<dots>,(Ident, \"hello\"),\<dots>), |
357 "[Token (\<dots>,(Ident, \"hello\"),\<dots>), |
357 Token (\<dots>,(Space, \" \"),\<dots>), |
358 Token (\<dots>,(Space, \" \"),\<dots>), |
358 Token (\<dots>,(Ident, \"world\"),\<dots>)]"} |
359 Token (\<dots>,(Ident, \"world\"),\<dots>)]"} |
359 |
360 |
360 produces three tokens where the first and the last are identifiers, since |
361 produces three tokens where the first and the last are identifiers, since |
361 @{ML_text [quotes] "hello"} and @{ML_text [quotes] "world"} do not match any |
362 @{text [quotes] "hello"} and @{text [quotes] "world"} do not match any |
362 other syntactic category.\footnote{Note that because of a possible a bug in |
363 other syntactic category.\footnote{Note that because of a possible a bug in |
363 the PolyML runtime system the result is printed as @{text "?"}, instead of |
364 the PolyML runtime system the result is printed as @{text "?"}, instead of |
364 the token.} The second indicates a space. |
365 the token.} The second indicates a space. |
365 |
366 |
366 Many parsing functions later on will require spaces, comments and the like |
367 Many parsing functions later on will require spaces, comments and the like |
428 (OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\") input |
429 (OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\") input |
429 end" |
430 end" |
430 "((\"|\",\"in\"),[])"} |
431 "((\"|\",\"in\"),[])"} |
431 |
432 |
432 The parser @{ML "OuterParse.enum s p" for s p} parses a possibly empty |
433 The parser @{ML "OuterParse.enum s p" for s p} parses a possibly empty |
433 list of items recognised by the parser @{ML_text p}, where the items being parsed |
434 list of items recognised by the parser @{text p}, where the items being parsed |
434 are separated by the string @{ML_text s}. For example |
435 are separated by the string @{text s}. For example |
435 |
436 |
436 @{ML_response [display] |
437 @{ML_response [display] |
437 "let |
438 "let |
438 val input = filtered_input \"in | in | in foo\" |
439 val input = filtered_input \"in | in | in foo\" |
439 in |
440 in |
440 (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input |
441 (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input |
441 end" |
442 end" |
442 "([\"in\",\"in\",\"in\"],[\<dots>])"} |
443 "([\"in\",\"in\",\"in\"],[\<dots>])"} |
443 |
444 |
444 @{ML "OuterParse.enum1"} works similarly, except that the parsed list must |
445 @{ML "OuterParse.enum1"} works similarly, except that the parsed list must |
445 be non-empty. Note that we had to add an @{ML_text [quotes] "foo"} at the end |
446 be non-empty. Note that we had to add an @{text [quotes] "foo"} at the end |
446 of the parsed string, otherwise the parser would have consumed all tokens |
447 of the parsed string, otherwise the parser would have consumed all tokens |
447 and then failed with the exception @{ML_text "MORE"}. Like in the previous |
448 and then failed with the exception @{text "MORE"}. Like in the previous |
448 section, we can avoid this exception using the wrapper @{ML |
449 section, we can avoid this exception using the wrapper @{ML |
449 Scan.finite}. This time, however, we have to use the ``stopper-token'' @{ML |
450 Scan.finite}. This time, however, we have to use the ``stopper-token'' @{ML |
450 OuterLex.stopper}. We can write |
451 OuterLex.stopper}. We can write |
451 |
452 |
452 @{ML_response [display] |
453 @{ML_response [display] |