33 |
33 |
34 section {* Building Generic Parsers *} |
34 section {* Building Generic Parsers *} |
35 |
35 |
36 text {* |
36 text {* |
37 |
37 |
38 Let us first have a look at parsing strings using generic parsing combinators. |
38 Let us first have a look at parsing strings using generic parsing |
39 The function @{ML "$$"} takes a string as argument and will ``consume'' this string from |
39 combinators. The function @{ML "$$"} takes a string as argument and will |
40 a given input list of strings. ``Consume'' in this context means that it will |
40 ``consume'' this string from a given input list of strings. ``Consume'' in |
41 return a pair consisting of this string and the rest of the input list. |
41 this context means that it will return a pair consisting of this string and |
42 For example: |
42 the rest of the input list. For example: |
43 |
43 |
44 @{ML_response [display,gray] "($$ \"h\") (explode \"hello\")" "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"} |
44 @{ML_response [display,gray] |
45 |
45 "($$ \"h\") (Symbol.explode \"hello\")" "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"} |
46 @{ML_response [display,gray] "($$ \"w\") (explode \"world\")" "(\"w\", [\"o\", \"r\", \"l\", \"d\"])"} |
46 |
47 |
47 @{ML_response [display,gray] |
48 The function @{ML "$$"} will either succeed (as in the two examples above) or raise the exception |
48 "($$ \"w\") (Symbol.explode \"world\")" "(\"w\", [\"o\", \"r\", \"l\", \"d\"])"} |
49 @{text "FAIL"} if no string can be consumed. For example trying to parse |
49 |
50 |
50 The function @{ML "$$"} will either succeed (as in the two examples above) |
51 @{ML_response_fake [display,gray] "($$ \"x\") (explode \"world\")" |
51 or raise the exception @{text "FAIL"} if no string can be consumed. For |
52 "Exception FAIL raised"} |
52 example trying to parse |
53 |
53 |
54 will raise the exception @{text "FAIL"}. |
54 @{ML_response_fake [display,gray] |
55 There are three exceptions used in the parsing combinators: |
55 "($$ \"x\") (Symbol.explode \"world\")" |
|
56 "Exception FAIL raised"} |
|
57 |
|
58 will raise the exception @{text "FAIL"}. There are three exceptions used in |
|
59 the parsing combinators: |
56 |
60 |
57 \begin{itemize} |
61 \begin{itemize} |
58 \item @{text "FAIL"} is used to indicate that alternative routes of parsing |
62 \item @{text "FAIL"} is used to indicate that alternative routes of parsing |
59 might be explored. |
63 might be explored. |
60 \item @{text "MORE"} indicates that there is not enough input for the parser. For example |
64 \item @{text "MORE"} indicates that there is not enough input for the parser. For example |
63 It is used for example in the function @{ML "!!"} (see below). |
67 It is used for example in the function @{ML "!!"} (see below). |
64 \end{itemize} |
68 \end{itemize} |
65 |
69 |
66 However, note that these exceptions are private to the parser and cannot be accessed |
70 However, note that these exceptions are private to the parser and cannot be accessed |
67 by the programmer (for example to handle them). |
71 by the programmer (for example to handle them). |
68 |
72 |
|
73 In the examples above we use the function @{ML Symbol.explode}, instead of the |
|
74 more standard library function @{ML explode}, for obtaining an input list for |
|
75 the parser. The reason is that @{ML Symbol.explode} is aware of character sequences, |
|
76 for example @{text "\\<foo>"}, that have a special meaning in Isabelle. To see |
|
77 the difference consider |
|
78 |
|
79 @{ML_response_fake [display,gray] |
|
80 "let |
|
81 val input = \"\\<foo> bar\" |
|
82 in |
|
83 (explode input, Symbol.explode input) |
|
84 end" |
|
85 "([\"\\\", \"<\", \"f\", \"o\", \"o\", \">\", \" \", \"b\", \"a\", \"r\"], |
|
86 [\"\\<foo>\", \" \", \"b\", \"a\", \"r\"])"} |
|
87 |
69 Slightly more general than the parser @{ML "$$"} is the function @{ML |
88 Slightly more general than the parser @{ML "$$"} is the function @{ML |
70 Scan.one}, in that it takes a predicate as argument and then parses exactly |
89 Scan.one}, in that it takes a predicate as argument and then parses exactly |
71 one item from the input list satisfying this predicate. For example the |
90 one item from the input list satisfying this predicate. For example the |
72 following parser either consumes an @{text [quotes] "h"} or a @{text |
91 following parser either consumes an @{text [quotes] "h"} or a @{text |
73 [quotes] "w"}: |
92 [quotes] "w"}: |
74 |
93 |
75 |
|
76 @{ML_response [display,gray] |
94 @{ML_response [display,gray] |
77 "let |
95 "let |
78 val hw = Scan.one (fn x => x = \"h\" orelse x = \"w\") |
96 val hw = Scan.one (fn x => x = \"h\" orelse x = \"w\") |
79 val input1 = explode \"hello\" |
97 val input1 = Symbol.explode \"hello\" |
80 val input2 = explode \"world\" |
98 val input2 = Symbol.explode \"world\" |
81 in |
99 in |
82 (hw input1, hw input2) |
100 (hw input1, hw input2) |
83 end" |
101 end" |
84 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} |
102 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} |
85 |
103 |
86 Two parsers can be connected in sequence by using the function @{ML "--"}. |
104 Two parsers can be connected in sequence by using the function @{ML "--"}. |
87 For example parsing @{text "h"}, @{text "e"} and @{text "l"} (in this |
105 For example parsing @{text "h"}, @{text "e"} and @{text "l"} (in this |
88 order) you can achieve by: |
106 order) you can achieve by: |
89 |
107 |
90 @{ML_response [display,gray] |
108 @{ML_response [display,gray] |
91 "($$ \"h\" -- $$ \"e\" -- $$ \"l\") (explode \"hello\")" |
109 "($$ \"h\" -- $$ \"e\" -- $$ \"l\") (Symbol.explode \"hello\")" |
92 "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"} |
110 "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"} |
93 |
111 |
94 Note how the result of consumed strings builds up on the left as nested pairs. |
112 Note how the result of consumed strings builds up on the left as nested pairs. |
95 |
113 |
96 If, as in the previous example, you want to parse a particular string, |
114 If, as in the previous example, you want to parse a particular string, |
97 then you should use the function @{ML Scan.this_string}: |
115 then you should use the function @{ML Scan.this_string}: |
98 |
116 |
99 @{ML_response [display,gray] |
117 @{ML_response [display,gray] |
100 "Scan.this_string \"hell\" (explode \"hello\")" |
118 "Scan.this_string \"hell\" (Symbol.explode \"hello\")" |
101 "(\"hell\", [\"o\"])"} |
119 "(\"hell\", [\"o\"])"} |
102 |
120 |
103 Parsers that explore alternatives can be constructed using the function @{ML |
121 Parsers that explore alternatives can be constructed using the function @{ML |
104 "||"}. The parser @{ML "(p || q)" for p q} returns the |
122 "||"}. The parser @{ML "(p || q)" for p q} returns the |
105 result of @{text "p"}, in case it succeeds, otherwise it returns the |
123 result of @{text "p"}, in case it succeeds, otherwise it returns the |
135 the default value @{text "x"}. For example: |
153 the default value @{text "x"}. For example: |
136 |
154 |
137 @{ML_response [display,gray] |
155 @{ML_response [display,gray] |
138 "let |
156 "let |
139 val p = Scan.optional ($$ \"h\") \"x\" |
157 val p = Scan.optional ($$ \"h\") \"x\" |
140 val input1 = explode \"hello\" |
158 val input1 = Symbol.explode \"hello\" |
141 val input2 = explode \"world\" |
159 val input2 = Symbol.explode \"world\" |
142 in |
160 in |
143 (p input1, p input2) |
161 (p input1, p input2) |
144 end" |
162 end" |
145 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} |
163 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} |
146 |
164 |
148 be given. Instead, the result is wrapped as an @{text "option"}-type. For example: |
166 be given. Instead, the result is wrapped as an @{text "option"}-type. For example: |
149 |
167 |
150 @{ML_response [display,gray] |
168 @{ML_response [display,gray] |
151 "let |
169 "let |
152 val p = Scan.option ($$ \"h\") |
170 val p = Scan.option ($$ \"h\") |
153 val input1 = explode \"hello\" |
171 val input1 = Symbol.explode \"hello\" |
154 val input2 = explode \"world\" |
172 val input2 = Symbol.explode \"world\" |
155 in |
173 in |
156 (p input1, p input2) |
174 (p input1, p input2) |
157 end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} |
175 end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} |
158 |
176 |
159 The function @{ML "!!"} helps to produce appropriate error messages |
177 The function @{ML "!!"} helps to produce appropriate error messages |
180 @{ML [display,gray] "!! (fn _ => \"foo\") ($$ \"h\")"} |
198 @{ML [display,gray] "!! (fn _ => \"foo\") ($$ \"h\")"} |
181 |
199 |
182 on @{text [quotes] "hello"}, the parsing succeeds |
200 on @{text [quotes] "hello"}, the parsing succeeds |
183 |
201 |
184 @{ML_response [display,gray] |
202 @{ML_response [display,gray] |
185 "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"hello\")" |
203 "(!! (fn _ => \"foo\") ($$ \"h\")) (Symbol.explode \"hello\")" |
186 "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"} |
204 "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"} |
187 |
205 |
188 but if you invoke it on @{text [quotes] "world"} |
206 but if you invoke it on @{text [quotes] "world"} |
189 |
207 |
190 @{ML_response_fake [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"world\")" |
208 @{ML_response_fake [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\")) (Symbol.explode \"world\")" |
191 "Exception ABORT raised"} |
209 "Exception ABORT raised"} |
192 |
210 |
193 then the parsing aborts and the error message @{text "foo"} is printed. In order to |
211 then the parsing aborts and the error message @{text "foo"} is printed. In order to |
194 see the error message properly, you need to prefix the parser with the function |
212 see the error message properly, you need to prefix the parser with the function |
195 @{ML "Scan.error"}. For example: |
213 @{ML "Scan.error"}. For example: |
217 text {* |
235 text {* |
218 Running this parser with the arguments |
236 Running this parser with the arguments |
219 @{text [quotes] "h"}, @{text [quotes] "e"} and @{text [quotes] "w"}, and |
237 @{text [quotes] "h"}, @{text [quotes] "e"} and @{text [quotes] "w"}, and |
220 the input @{text [quotes] "holle"} |
238 the input @{text [quotes] "holle"} |
221 |
239 |
222 @{ML_response_fake [display,gray] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"holle\")" |
240 @{ML_response_fake [display,gray] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (Symbol.explode \"holle\")" |
223 "Exception ERROR \"h is not followed by e\" raised"} |
241 "Exception ERROR \"h is not followed by e\" raised"} |
224 |
242 |
225 produces the correct error message. Running it with |
243 produces the correct error message. Running it with |
226 |
244 |
227 @{ML_response [display,gray] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"wworld\")" |
245 @{ML_response [display,gray] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (Symbol.explode \"wworld\")" |
228 "((\"w\", \"w\"), [\"o\", \"r\", \"l\", \"d\"])"} |
246 "((\"w\", \"w\"), [\"o\", \"r\", \"l\", \"d\"])"} |
229 |
247 |
230 yields the expected parsing. |
248 yields the expected parsing. |
231 |
249 |
232 The function @{ML "Scan.repeat p" for p} will apply a parser @{text p} as |
250 The function @{ML "Scan.repeat p" for p} will apply a parser @{text p} as |
233 often as it succeeds. For example: |
251 often as it succeeds. For example: |
234 |
252 |
235 @{ML_response [display,gray] "Scan.repeat ($$ \"h\") (explode \"hhhhello\")" |
253 @{ML_response [display,gray] "Scan.repeat ($$ \"h\") (Symbol.explode \"hhhhello\")" |
236 "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"} |
254 "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"} |
237 |
255 |
238 Note that @{ML "Scan.repeat"} stores the parsed items in a list. The function |
256 Note that @{ML "Scan.repeat"} stores the parsed items in a list. The function |
239 @{ML "Scan.repeat1"} is similar, but requires that the parser @{text "p"} |
257 @{ML "Scan.repeat1"} is similar, but requires that the parser @{text "p"} |
240 succeeds at least once. |
258 succeeds at least once. |
242 Also note that the parser would have aborted with the exception @{text MORE}, if |
260 Also note that the parser would have aborted with the exception @{text MORE}, if |
243 you had run it only on just @{text [quotes] "hhhh"}. This can be avoided by using |
261 you had run it only on just @{text [quotes] "hhhh"}. This can be avoided by using |
244 the wrapper @{ML Scan.finite} and the ``stopper-token'' @{ML Symbol.stopper}. With |
262 the wrapper @{ML Scan.finite} and the ``stopper-token'' @{ML Symbol.stopper}. With |
245 them you can write: |
263 them you can write: |
246 |
264 |
247 @{ML_response [display,gray] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (explode \"hhhh\")" |
265 @{ML_response [display,gray] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (Symbol.explode \"hhhh\")" |
248 "([\"h\", \"h\", \"h\", \"h\"], [])"} |
266 "([\"h\", \"h\", \"h\", \"h\"], [])"} |
249 |
267 |
250 @{ML Symbol.stopper} is the ``end-of-input'' indicator for parsing strings; |
268 @{ML Symbol.stopper} is the ``end-of-input'' indicator for parsing strings; |
251 other stoppers need to be used when parsing, for example, tokens. However, this kind of |
269 other stoppers need to be used when parsing, for example, tokens. However, this kind of |
252 manually wrapping is often already done by the surrounding infrastructure. |
270 manually wrapping is often already done by the surrounding infrastructure. |
267 end of the input string (i.e.~stopper symbol). |
285 end of the input string (i.e.~stopper symbol). |
268 |
286 |
269 The function @{ML "Scan.unless p q" for p q} takes two parsers: if the first one can |
287 The function @{ML "Scan.unless p q" for p q} takes two parsers: if the first one can |
270 parse the input, then the whole parser fails; if not, then the second is tried. Therefore |
288 parse the input, then the whole parser fails; if not, then the second is tried. Therefore |
271 |
289 |
272 @{ML_response_fake_both [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (explode \"hello\")" |
290 @{ML_response_fake_both [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"hello\")" |
273 "Exception FAIL raised"} |
291 "Exception FAIL raised"} |
274 |
292 |
275 fails, while |
293 fails, while |
276 |
294 |
277 @{ML_response [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (explode \"world\")" |
295 @{ML_response [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"world\")" |
278 "(\"w\",[\"o\", \"r\", \"l\", \"d\"])"} |
296 "(\"w\",[\"o\", \"r\", \"l\", \"d\"])"} |
279 |
297 |
280 succeeds. |
298 succeeds. |
281 |
299 |
282 The functions @{ML Scan.repeat} and @{ML Scan.unless} can be combined to read any |
300 The functions @{ML Scan.repeat} and @{ML Scan.unless} can be combined to read any |
284 symbol is a @{text [quotes] "*"}. |
302 symbol is a @{text [quotes] "*"}. |
285 |
303 |
286 @{ML_response [display,gray] |
304 @{ML_response [display,gray] |
287 "let |
305 "let |
288 val p = Scan.repeat (Scan.unless ($$ \"*\") (Scan.one Symbol.not_eof)) |
306 val p = Scan.repeat (Scan.unless ($$ \"*\") (Scan.one Symbol.not_eof)) |
289 val input1 = explode \"fooooo\" |
307 val input1 = Symbol.explode \"fooooo\" |
290 val input2 = explode \"foo*ooo\" |
308 val input2 = Symbol.explode \"foo*ooo\" |
291 in |
309 in |
292 (Scan.finite Symbol.stopper p input1, |
310 (Scan.finite Symbol.stopper p input1, |
293 Scan.finite Symbol.stopper p input2) |
311 Scan.finite Symbol.stopper p input2) |
294 end" |
312 end" |
295 "(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []), |
313 "(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []), |
302 |
320 |
303 @{ML_response [display,gray] |
321 @{ML_response [display,gray] |
304 "let |
322 "let |
305 fun double (x, y) = (x ^ x, y ^ y) |
323 fun double (x, y) = (x ^ x, y ^ y) |
306 in |
324 in |
307 (($$ \"h\") -- ($$ \"e\") >> double) (explode \"hello\") |
325 (($$ \"h\") -- ($$ \"e\") >> double) (Symbol.explode \"hello\") |
308 end" |
326 end" |
309 "((\"hh\", \"ee\"), [\"l\", \"l\", \"o\"])"} |
327 "((\"hh\", \"ee\"), [\"l\", \"l\", \"o\"])"} |
310 |
328 |
311 doubles the two parsed input strings; or |
329 doubles the two parsed input strings; or |
312 |
330 |
313 @{ML_response [display,gray] |
331 @{ML_response [display,gray] |
314 "let |
332 "let |
315 val p = Scan.repeat (Scan.one Symbol.not_eof) |
333 val p = Scan.repeat (Scan.one Symbol.not_eof) |
316 val input = explode \"foo bar foo\" |
334 val input = Symbol.explode \"foo bar foo\" |
317 in |
335 in |
318 Scan.finite Symbol.stopper (p >> implode) input |
336 Scan.finite Symbol.stopper (p >> implode) input |
319 end" |
337 end" |
320 "(\"foo bar foo\",[])"} |
338 "(\"foo bar foo\",[])"} |
321 |
339 |
326 |
344 |
327 The function @{ML Scan.ahead} parses some input, but leaves the original |
345 The function @{ML Scan.ahead} parses some input, but leaves the original |
328 input unchanged. For example: |
346 input unchanged. For example: |
329 |
347 |
330 @{ML_response [display,gray] |
348 @{ML_response [display,gray] |
331 "Scan.ahead (Scan.this_string \"foo\") (explode \"foo\")" |
349 "Scan.ahead (Scan.this_string \"foo\") (Symbol.explode \"foo\")" |
332 "(\"foo\", [\"f\", \"o\", \"o\"])"} |
350 "(\"foo\", [\"f\", \"o\", \"o\"])"} |
333 |
351 |
334 The function @{ML Scan.lift} takes a parser and a pair as arguments. This function applies |
352 The function @{ML Scan.lift} takes a parser and a pair as arguments. This function applies |
335 the given parser to the second component of the pair and leaves the first component |
353 the given parser to the second component of the pair and leaves the first component |
336 untouched. For example |
354 untouched. For example |
337 |
355 |
338 @{ML_response [display,gray] |
356 @{ML_response [display,gray] |
339 "Scan.lift ($$ \"h\" -- $$ \"e\") (1, explode \"hello\")" |
357 "Scan.lift ($$ \"h\" -- $$ \"e\") (1, Symbol.explode \"hello\")" |
340 "((\"h\", \"e\"), (1, [\"l\", \"l\", \"o\"]))"} |
358 "((\"h\", \"e\"), (1, [\"l\", \"l\", \"o\"]))"} |
341 |
359 |
342 (FIXME: In which situations is this useful? Give examples.) |
360 (FIXME: In which situations is this useful? Give examples.) |
343 |
361 |
344 \begin{exercise}\label{ex:scancmts} |
362 \begin{exercise}\label{ex:scancmts} |