96 that @{ML explode in Symbol} is aware of character |
96 that @{ML explode in Symbol} is aware of character |
97 sequences, for example \<open>\<foo>\<close>, that have a special meaning in |
97 sequences, for example \<open>\<foo>\<close>, that have a special meaning in |
98 Isabelle. To see the difference consider |
98 Isabelle. To see the difference consider |
99 |
99 |
100 @{ML_matchresult_fake [display,gray] |
100 @{ML_matchresult_fake [display,gray] |
101 "let |
101 \<open>let |
102 val input = \"\<foo> bar\" |
102 val input = "\<foo> bar" |
103 in |
103 in |
104 (String.explode input, Symbol.explode input) |
104 (String.explode input, Symbol.explode input) |
105 end" |
105 end\<close> |
106 "([\"\\\", \"<\", \"f\", \"o\", \"o\", \">\", \" \", \"b\", \"a\", \"r\"], |
106 \<open>(["\\", "<", "f", "o", "o", ">", " ", "b", "a", "r"], |
107 [\"\<foo>\", \" \", \"b\", \"a\", \"r\"])"} |
107 ["\<foo>", " ", "b", "a", "r"])\<close>} |
108 |
108 |
109 Slightly more general than the parser @{ML "$$"} is the function |
109 Slightly more general than the parser @{ML \<open>$$\<close>} is the function |
110 @{ML_ind one in Scan}, in that it takes a predicate as argument and |
110 @{ML_ind one in Scan}, in that it takes a predicate as argument and |
111 then parses exactly |
111 then parses exactly |
112 one item from the input list satisfying this predicate. For example the |
112 one item from the input list satisfying this predicate. For example the |
113 following parser either consumes an @{text [quotes] "h"} or a @{text |
113 following parser either consumes an @{text [quotes] "h"} or a @{text |
114 [quotes] "w"}: |
114 [quotes] "w"}: |
115 |
115 |
116 @{ML_matchresult [display,gray] |
116 @{ML_matchresult [display,gray] |
117 "let |
117 \<open>let |
118 val hw = Scan.one (fn x => x = \"h\" orelse x = \"w\") |
118 val hw = Scan.one (fn x => x = "h" orelse x = "w") |
119 val input1 = Symbol.explode \"hello\" |
119 val input1 = Symbol.explode "hello" |
120 val input2 = Symbol.explode \"world\" |
120 val input2 = Symbol.explode "world" |
121 in |
121 in |
122 (hw input1, hw input2) |
122 (hw input1, hw input2) |
123 end" |
123 end\<close> |
124 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} |
124 \<open>(("h", ["e", "l", "l", "o"]),("w", ["o", "r", "l", "d"]))\<close>} |
125 |
125 |
126 Two parsers can be connected in sequence by using the function @{ML_ind "--" in Scan}. |
126 Two parsers can be connected in sequence by using the function @{ML_ind "--" in Scan}. |
127 For example parsing \<open>h\<close>, \<open>e\<close> and \<open>l\<close> (in this |
127 For example parsing \<open>h\<close>, \<open>e\<close> and \<open>l\<close> (in this |
128 order) you can achieve by: |
128 order) you can achieve by: |
129 |
129 |
130 @{ML_matchresult [display,gray] |
130 @{ML_matchresult [display,gray] |
131 "($$ \"h\" -- $$ \"e\" -- $$ \"l\") (Symbol.explode \"hello\")" |
131 \<open>($$ "h" -- $$ "e" -- $$ "l") (Symbol.explode "hello")\<close> |
132 "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"} |
132 \<open>((("h", "e"), "l"), ["l", "o"])\<close>} |
133 |
133 |
134 Note how the result of consumed strings builds up on the left as nested pairs. |
134 Note how the result of consumed strings builds up on the left as nested pairs. |
135 |
135 |
136 If, as in the previous example, you want to parse a particular string, |
136 If, as in the previous example, you want to parse a particular string, |
137 then you can use the function @{ML_ind this_string in Scan}. |
137 then you can use the function @{ML_ind this_string in Scan}. |
138 |
138 |
139 @{ML_matchresult [display,gray] |
139 @{ML_matchresult [display,gray] |
140 "Scan.this_string \"hell\" (Symbol.explode \"hello\")" |
140 \<open>Scan.this_string "hell" (Symbol.explode "hello")\<close> |
141 "(\"hell\", [\"o\"])"} |
141 \<open>("hell", ["o"])\<close>} |
142 |
142 |
143 Parsers that explore alternatives can be constructed using the function |
143 Parsers that explore alternatives can be constructed using the function |
144 @{ML_ind "||" in Scan}. The parser @{ML "(p || q)" for p q} returns the |
144 @{ML_ind "||" in Scan}. The parser @{ML \<open>(p || q)\<close> for p q} returns the |
145 result of \<open>p\<close>, in case it succeeds; otherwise it returns the |
145 result of \<open>p\<close>, in case it succeeds; otherwise it returns the |
146 result of \<open>q\<close>. For example: |
146 result of \<open>q\<close>. For example: |
147 |
147 |
148 |
148 |
149 @{ML_matchresult [display,gray] |
149 @{ML_matchresult [display,gray] |
150 "let |
150 \<open>let |
151 val hw = $$ \"h\" || $$ \"w\" |
151 val hw = $$ "h" || $$ "w" |
152 val input1 = Symbol.explode \"hello\" |
152 val input1 = Symbol.explode "hello" |
153 val input2 = Symbol.explode \"world\" |
153 val input2 = Symbol.explode "world" |
154 in |
154 in |
155 (hw input1, hw input2) |
155 (hw input1, hw input2) |
156 end" |
156 end\<close> |
157 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} |
157 \<open>(("h", ["e", "l", "l", "o"]), ("w", ["o", "r", "l", "d"]))\<close>} |
158 |
158 |
159 The functions @{ML_ind "|--" in Scan} and @{ML_ind "--|" in Scan} work like the sequencing |
159 The functions @{ML_ind "|--" in Scan} and @{ML_ind "--|" in Scan} work like the sequencing |
160 function for parsers, except that they discard the item being parsed by the |
160 function for parsers, except that they discard the item being parsed by the |
161 first (respectively second) parser. That means the item being dropped is the |
161 first (respectively second) parser. That means the item being dropped is the |
162 one that @{ML_ind "|--" in Scan} and @{ML_ind "--|" in Scan} ``point'' away. |
162 one that @{ML_ind "|--" in Scan} and @{ML_ind "--|" in Scan} ``point'' away. |
163 For example: |
163 For example: |
164 |
164 |
165 @{ML_matchresult [display,gray] |
165 @{ML_matchresult [display,gray] |
166 "let |
166 \<open>let |
167 val just_e = $$ \"h\" |-- $$ \"e\" |
167 val just_e = $$ "h" |-- $$ "e" |
168 val just_h = $$ \"h\" --| $$ \"e\" |
168 val just_h = $$ "h" --| $$ "e" |
169 val input = Symbol.explode \"hello\" |
169 val input = Symbol.explode "hello" |
170 in |
170 in |
171 (just_e input, just_h input) |
171 (just_e input, just_h input) |
172 end" |
172 end\<close> |
173 "((\"e\", [\"l\", \"l\", \"o\"]), (\"h\", [\"l\", \"l\", \"o\"]))"} |
173 \<open>(("e", ["l", "l", "o"]), ("h", ["l", "l", "o"]))\<close>} |
174 |
174 |
175 The parser @{ML "Scan.optional p x" for p x} returns the result of the parser |
175 The parser @{ML \<open>Scan.optional p x\<close> for p x} returns the result of the parser |
176 \<open>p\<close>, in case it succeeds; otherwise it returns |
176 \<open>p\<close>, in case it succeeds; otherwise it returns |
177 the default value \<open>x\<close>. For example: |
177 the default value \<open>x\<close>. For example: |
178 |
178 |
179 @{ML_matchresult [display,gray] |
179 @{ML_matchresult [display,gray] |
180 "let |
180 \<open>let |
181 val p = Scan.optional ($$ \"h\") \"x\" |
181 val p = Scan.optional ($$ "h") "x" |
182 val input1 = Symbol.explode \"hello\" |
182 val input1 = Symbol.explode "hello" |
183 val input2 = Symbol.explode \"world\" |
183 val input2 = Symbol.explode "world" |
184 in |
184 in |
185 (p input1, p input2) |
185 (p input1, p input2) |
186 end" |
186 end\<close> |
187 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} |
187 \<open>(("h", ["e", "l", "l", "o"]), ("x", ["w", "o", "r", "l", "d"]))\<close>} |
188 |
188 |
189 The function @{ML_ind option in Scan} works similarly, except no default value can |
189 The function @{ML_ind option in Scan} works similarly, except no default value can |
190 be given. Instead, the result is wrapped as an \<open>option\<close>-type. For example: |
190 be given. Instead, the result is wrapped as an \<open>option\<close>-type. For example: |
191 |
191 |
192 @{ML_matchresult [display,gray] |
192 @{ML_matchresult [display,gray] |
193 "let |
193 \<open>let |
194 val p = Scan.option ($$ \"h\") |
194 val p = Scan.option ($$ "h") |
195 val input1 = Symbol.explode \"hello\" |
195 val input1 = Symbol.explode "hello" |
196 val input2 = Symbol.explode \"world\" |
196 val input2 = Symbol.explode "world" |
197 in |
197 in |
198 (p input1, p input2) |
198 (p input1, p input2) |
199 end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} |
199 end\<close> \<open>((SOME "h", ["e", "l", "l", "o"]), (NONE, ["w", "o", "r", "l", "d"]))\<close>} |
200 |
200 |
201 The function @{ML_ind ahead in Scan} parses some input, but leaves the original |
201 The function @{ML_ind ahead in Scan} parses some input, but leaves the original |
202 input unchanged. For example: |
202 input unchanged. For example: |
203 |
203 |
204 @{ML_matchresult [display,gray] |
204 @{ML_matchresult [display,gray] |
205 "Scan.ahead (Scan.this_string \"foo\") (Symbol.explode \"foo\")" |
205 \<open>Scan.ahead (Scan.this_string "foo") (Symbol.explode "foo")\<close> |
206 "(\"foo\", [\"f\", \"o\", \"o\"])"} |
206 \<open>("foo", ["f", "o", "o"])\<close>} |
207 |
207 |
208 The function @{ML_ind "!!" in Scan} helps with producing appropriate error messages |
208 The function @{ML_ind "!!" in Scan} helps with producing appropriate error messages |
209 during parsing. For example if you want to parse \<open>p\<close> immediately |
209 during parsing. For example if you want to parse \<open>p\<close> immediately |
210 followed by \<open>q\<close>, or start a completely different parser \<open>r\<close>, |
210 followed by \<open>q\<close>, or start a completely different parser \<open>r\<close>, |
211 you might write: |
211 you might write: |
212 |
212 |
213 @{ML [display,gray] "(p -- q) || r" for p q r} |
213 @{ML [display,gray] \<open>(p -- q) || r\<close> for p q r} |
214 |
214 |
215 However, this parser is problematic for producing a useful error |
215 However, this parser is problematic for producing a useful error |
216 message, if the parsing of @{ML "(p -- q)" for p q} fails. Because with the |
216 message, if the parsing of @{ML \<open>(p -- q)\<close> for p q} fails. Because with the |
217 parser above you lose the information that \<open>p\<close> should be followed by \<open>q\<close>. |
217 parser above you lose the information that \<open>p\<close> should be followed by \<open>q\<close>. |
218 To see this assume that \<open>p\<close> is present in the input, but it is not |
218 To see this assume that \<open>p\<close> is present in the input, but it is not |
219 followed by \<open>q\<close>. That means @{ML "(p -- q)" for p q} will fail and |
219 followed by \<open>q\<close>. That means @{ML \<open>(p -- q)\<close> for p q} will fail and |
220 hence the alternative parser \<open>r\<close> will be tried. However, in many |
220 hence the alternative parser \<open>r\<close> will be tried. However, in many |
221 circumstances this will be the wrong parser for the input ``\<open>p\<close>-followed-by-something'' |
221 circumstances this will be the wrong parser for the input ``\<open>p\<close>-followed-by-something'' |
222 and therefore will also fail. The error message is then caused by the failure |
222 and therefore will also fail. The error message is then caused by the failure |
223 of \<open>r\<close>, not by the absence of \<open>q\<close> in the input. Such |
223 of \<open>r\<close>, not by the absence of \<open>q\<close> in the input. Such |
224 situation can be avoided when using the function @{ML "!!"}. This function |
224 situation can be avoided when using the function @{ML \<open>!!\<close>}. This function |
225 aborts the whole process of parsing in case of a failure and prints an error |
225 aborts the whole process of parsing in case of a failure and prints an error |
226 message. For example if you invoke the parser |
226 message. For example if you invoke the parser |
227 |
227 |
228 |
228 |
229 @{ML [display,gray] "!! (fn _ => fn _ =>\"foo\") ($$ \"h\")"} |
229 @{ML [display,gray] \<open>!! (fn _ => fn _ =>"foo") ($$ "h")\<close>} |
230 |
230 |
231 on @{text [quotes] "hello"}, the parsing succeeds |
231 on @{text [quotes] "hello"}, the parsing succeeds |
232 |
232 |
233 @{ML_matchresult [display,gray] |
233 @{ML_matchresult [display,gray] |
234 "(!! (fn _ => fn _ => \"foo\") ($$ \"h\")) (Symbol.explode \"hello\")" |
234 \<open>(!! (fn _ => fn _ => "foo") ($$ "h")) (Symbol.explode "hello")\<close> |
235 "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"} |
235 \<open>("h", ["e", "l", "l", "o"])\<close>} |
236 |
236 |
237 but if you invoke it on @{text [quotes] "world"} |
237 but if you invoke it on @{text [quotes] "world"} |
238 |
238 |
239 @{ML_matchresult_fake [display,gray] "(!! (fn _ => fn _ => \"foo\") ($$ \"h\")) (Symbol.explode \"world\")" |
239 @{ML_matchresult_fake [display,gray] \<open>(!! (fn _ => fn _ => "foo") ($$ "h")) (Symbol.explode "world")\<close> |
240 "Exception ABORT raised"} |
240 \<open>Exception ABORT raised\<close>} |
241 |
241 |
242 then the parsing aborts and the error message \<open>foo\<close> is printed. In order to |
242 then the parsing aborts and the error message \<open>foo\<close> is printed. In order to |
243 see the error message properly, you need to prefix the parser with the function |
243 see the error message properly, you need to prefix the parser with the function |
244 @{ML_ind error in Scan}. For example: |
244 @{ML_ind error in Scan}. For example: |
245 |
245 |
246 @{ML_matchresult_fake [display,gray] |
246 @{ML_matchresult_fake [display,gray] |
247 "Scan.error (!! (fn _ => fn _ => \"foo\") ($$ \"h\")) (Symbol.explode \"world\")" |
247 \<open>Scan.error (!! (fn _ => fn _ => "foo") ($$ "h")) (Symbol.explode "world")\<close> |
248 "Exception Error \"foo\" raised"} |
248 \<open>Exception Error "foo" raised\<close>} |
249 |
249 |
250 This kind of ``prefixing'' to see the correct error message is usually done by wrappers |
250 This kind of ``prefixing'' to see the correct error message is usually done by wrappers |
251 such as @{ML_ind local_theory in Outer_Syntax} |
251 such as @{ML_ind local_theory in Outer_Syntax} |
252 (see Section~\ref{sec:newcommand} which explains this function in more detail). |
252 (see Section~\ref{sec:newcommand} which explains this function in more detail). |
253 |
253 |
254 Let us now return to our example of parsing @{ML "(p -- q) || r" for p q |
254 Let us now return to our example of parsing @{ML \<open>(p -- q) || r\<close> for p q |
255 r}. If you want to generate the correct error message for failure |
255 r}. If you want to generate the correct error message for failure |
256 of parsing \<open>p\<close>-followed-by-\<open>q\<close>, then you have to write: |
256 of parsing \<open>p\<close>-followed-by-\<open>q\<close>, then you have to write: |
257 \<close> |
257 \<close> |
258 |
258 |
259 ML %grayML\<open>fun p_followed_by_q p q r = |
259 ML %grayML\<open>fun p_followed_by_q p q r = |
267 text \<open> |
267 text \<open> |
268 Running this parser with the arguments |
268 Running this parser with the arguments |
269 @{text [quotes] "h"}, @{text [quotes] "e"} and @{text [quotes] "w"}, and |
269 @{text [quotes] "h"}, @{text [quotes] "e"} and @{text [quotes] "w"}, and |
270 the input @{text [quotes] "holle"} |
270 the input @{text [quotes] "holle"} |
271 |
271 |
272 @{ML_matchresult_fake [display,gray] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (Symbol.explode \"holle\")" |
272 @{ML_matchresult_fake [display,gray] \<open>Scan.error (p_followed_by_q "h" "e" "w") (Symbol.explode "holle")\<close> |
273 "Exception ERROR \"h is not followed by e\" raised"} |
273 \<open>Exception ERROR "h is not followed by e" raised\<close>} |
274 |
274 |
275 produces the correct error message. Running it with |
275 produces the correct error message. Running it with |
276 |
276 |
277 @{ML_matchresult [display,gray] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (Symbol.explode \"wworld\")" |
277 @{ML_matchresult [display,gray] \<open>Scan.error (p_followed_by_q "h" "e" "w") (Symbol.explode "wworld")\<close> |
278 "((\"w\", \"w\"), [\"o\", \"r\", \"l\", \"d\"])"} |
278 \<open>(("w", "w"), ["o", "r", "l", "d"])\<close>} |
279 |
279 |
280 yields the expected parsing. |
280 yields the expected parsing. |
281 |
281 |
282 The function @{ML "Scan.repeat p" for p} will apply a parser \<open>p\<close> as |
282 The function @{ML \<open>Scan.repeat p\<close> for p} will apply a parser \<open>p\<close> as |
283 long as it succeeds. For example: |
283 long as it succeeds. For example: |
284 |
284 |
285 @{ML_matchresult [display,gray] "Scan.repeat ($$ \"h\") (Symbol.explode \"hhhhello\")" |
285 @{ML_matchresult [display,gray] \<open>Scan.repeat ($$ "h") (Symbol.explode "hhhhello")\<close> |
286 "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"} |
286 \<open>(["h", "h", "h", "h"], ["e", "l", "l", "o"])\<close>} |
287 |
287 |
288 Note that @{ML_ind repeat in Scan} stores the parsed items in a list. The function |
288 Note that @{ML_ind repeat in Scan} stores the parsed items in a list. The function |
289 @{ML_ind repeat1 in Scan} is similar, but requires that the parser \<open>p\<close> |
289 @{ML_ind repeat1 in Scan} is similar, but requires that the parser \<open>p\<close> |
290 succeeds at least once. |
290 succeeds at least once. |
291 |
291 |
292 Also note that the parser would have aborted with the exception \<open>MORE\<close>, if |
292 Also note that the parser would have aborted with the exception \<open>MORE\<close>, if |
293 you had it run with the string @{text [quotes] "hhhh"}. This can be avoided by using |
293 you had it run with the string @{text [quotes] "hhhh"}. This can be avoided by using |
294 the wrapper @{ML_ind finite in Scan} and the ``stopper-token'' |
294 the wrapper @{ML_ind finite in Scan} and the ``stopper-token'' |
295 @{ML_ind stopper in Symbol}. With them you can write: |
295 @{ML_ind stopper in Symbol}. With them you can write: |
296 |
296 |
297 @{ML_matchresult [display,gray] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (Symbol.explode \"hhhh\")" |
297 @{ML_matchresult [display,gray] \<open>Scan.finite Symbol.stopper (Scan.repeat ($$ "h")) (Symbol.explode "hhhh")\<close> |
298 "([\"h\", \"h\", \"h\", \"h\"], [])"} |
298 \<open>(["h", "h", "h", "h"], [])\<close>} |
299 |
299 |
300 The function @{ML stopper in Symbol} is the ``end-of-input'' indicator for parsing strings; |
300 The function @{ML stopper in Symbol} is the ``end-of-input'' indicator for parsing strings; |
301 other stoppers need to be used when parsing, for example, tokens. However, this kind of |
301 other stoppers need to be used when parsing, for example, tokens. However, this kind of |
302 manually wrapping is often already done by the surrounding infrastructure. |
302 manually wrapping is often already done by the surrounding infrastructure. |
303 |
303 |
304 The function @{ML_ind repeat in Scan} can be used with @{ML_ind one in Scan} to read any |
304 The function @{ML_ind repeat in Scan} can be used with @{ML_ind one in Scan} to read any |
305 string as in |
305 string as in |
306 |
306 |
307 @{ML_matchresult [display,gray] |
307 @{ML_matchresult [display,gray] |
308 "let |
308 \<open>let |
309 val p = Scan.repeat (Scan.one Symbol.not_eof) |
309 val p = Scan.repeat (Scan.one Symbol.not_eof) |
310 val input = Symbol.explode \"foo bar foo\" |
310 val input = Symbol.explode "foo bar foo" |
311 in |
311 in |
312 Scan.finite Symbol.stopper p input |
312 Scan.finite Symbol.stopper p input |
313 end" |
313 end\<close> |
314 "([\"f\", \"o\", \"o\", \" \", \"b\", \"a\", \"r\", \" \", \"f\", \"o\", \"o\"], [])"} |
314 \<open>(["f", "o", "o", " ", "b", "a", "r", " ", "f", "o", "o"], [])\<close>} |
315 |
315 |
316 where the function @{ML_ind not_eof in Symbol} ensures that we do not read beyond the |
316 where the function @{ML_ind not_eof in Symbol} ensures that we do not read beyond the |
317 end of the input string (i.e.~stopper symbol). |
317 end of the input string (i.e.~stopper symbol). |
318 |
318 |
319 The function @{ML_ind unless in Scan} takes two parsers: if the first one can |
319 The function @{ML_ind unless in Scan} takes two parsers: if the first one can |
320 parse the input, then the whole parser fails; if not, then the second is tried. Therefore |
320 parse the input, then the whole parser fails; if not, then the second is tried. Therefore |
321 |
321 |
322 @{ML_matchresult_fake_both [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"hello\")" |
322 @{ML_matchresult_fake_both [display,gray] \<open>Scan.unless ($$ "h") ($$ "w") (Symbol.explode "hello")\<close> |
323 "Exception FAIL raised"} |
323 \<open>Exception FAIL raised\<close>} |
324 |
324 |
325 fails, while |
325 fails, while |
326 |
326 |
327 @{ML_matchresult [display,gray] "Scan.unless ($$ \"h\") ($$ \"w\") (Symbol.explode \"world\")" |
327 @{ML_matchresult [display,gray] \<open>Scan.unless ($$ "h") ($$ "w") (Symbol.explode "world")\<close> |
328 "(\"w\",[\"o\", \"r\", \"l\", \"d\"])"} |
328 \<open>("w",["o", "r", "l", "d"])\<close>} |
329 |
329 |
330 succeeds. |
330 succeeds. |
331 |
331 |
332 The functions @{ML_ind repeat in Scan} and @{ML_ind unless in Scan} can |
332 The functions @{ML_ind repeat in Scan} and @{ML_ind unless in Scan} can |
333 be combined to read any input until a certain marker symbol is reached. In the |
333 be combined to read any input until a certain marker symbol is reached. In the |
334 example below the marker symbol is a @{text [quotes] "*"}. |
334 example below the marker symbol is a @{text [quotes] "*"}. |
335 |
335 |
336 @{ML_matchresult [display,gray] |
336 @{ML_matchresult [display,gray] |
337 "let |
337 \<open>let |
338 val p = Scan.repeat (Scan.unless ($$ \"*\") (Scan.one Symbol.not_eof)) |
338 val p = Scan.repeat (Scan.unless ($$ "*") (Scan.one Symbol.not_eof)) |
339 val input1 = Symbol.explode \"fooooo\" |
339 val input1 = Symbol.explode "fooooo" |
340 val input2 = Symbol.explode \"foo*ooo\" |
340 val input2 = Symbol.explode "foo*ooo" |
341 in |
341 in |
342 (Scan.finite Symbol.stopper p input1, |
342 (Scan.finite Symbol.stopper p input1, |
343 Scan.finite Symbol.stopper p input2) |
343 Scan.finite Symbol.stopper p input2) |
344 end" |
344 end\<close> |
345 "(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []), |
345 \<open>((["f", "o", "o", "o", "o", "o"], []), |
346 ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"} |
346 (["f", "o", "o"], ["*", "o", "o", "o"]))\<close>} |
347 |
347 |
348 |
348 |
349 After parsing is done, you almost always want to apply a function to the parsed |
349 After parsing is done, you almost always want to apply a function to the parsed |
350 items. One way to do this is the function @{ML_ind ">>" in Scan} where |
350 items. One way to do this is the function @{ML_ind ">>" in Scan} where |
351 @{ML "(p >> f)" for p f} runs |
351 @{ML \<open>(p >> f)\<close> for p f} runs |
352 first the parser \<open>p\<close> and upon successful completion applies the |
352 first the parser \<open>p\<close> and upon successful completion applies the |
353 function \<open>f\<close> to the result. For example |
353 function \<open>f\<close> to the result. For example |
354 |
354 |
355 @{ML_matchresult [display,gray] |
355 @{ML_matchresult [display,gray] |
356 "let |
356 \<open>let |
357 fun double (x, y) = (x ^ x, y ^ y) |
357 fun double (x, y) = (x ^ x, y ^ y) |
358 val parser = $$ \"h\" -- $$ \"e\" |
358 val parser = $$ "h" -- $$ "e" |
359 in |
359 in |
360 (parser >> double) (Symbol.explode \"hello\") |
360 (parser >> double) (Symbol.explode "hello") |
361 end" |
361 end\<close> |
362 "((\"hh\", \"ee\"), [\"l\", \"l\", \"o\"])"} |
362 \<open>(("hh", "ee"), ["l", "l", "o"])\<close>} |
363 |
363 |
364 doubles the two parsed input strings; or |
364 doubles the two parsed input strings; or |
365 |
365 |
366 @{ML_matchresult [display,gray] |
366 @{ML_matchresult [display,gray] |
367 "let |
367 \<open>let |
368 val p = Scan.repeat (Scan.one Symbol.not_eof) |
368 val p = Scan.repeat (Scan.one Symbol.not_eof) |
369 val input = Symbol.explode \"foo bar foo\" |
369 val input = Symbol.explode "foo bar foo" |
370 in |
370 in |
371 Scan.finite Symbol.stopper (p >> implode) input |
371 Scan.finite Symbol.stopper (p >> implode) input |
372 end" |
372 end\<close> |
373 "(\"foo bar foo\",[])"} |
373 \<open>("foo bar foo",[])\<close>} |
374 |
374 |
375 where the single-character strings in the parsed output are transformed |
375 where the single-character strings in the parsed output are transformed |
376 back into one string. |
376 back into one string. |
377 |
377 |
378 The function @{ML_ind lift in Scan} takes a parser and a pair as arguments. This function applies |
378 The function @{ML_ind lift in Scan} takes a parser and a pair as arguments. This function applies |
379 the given parser to the second component of the pair and leaves the first component |
379 the given parser to the second component of the pair and leaves the first component |
380 untouched. For example |
380 untouched. For example |
381 |
381 |
382 @{ML_matchresult [display,gray] |
382 @{ML_matchresult [display,gray] |
383 "Scan.lift ($$ \"h\" -- $$ \"e\") (1, Symbol.explode \"hello\")" |
383 \<open>Scan.lift ($$ "h" -- $$ "e") (1, Symbol.explode "hello")\<close> |
384 "((\"h\", \"e\"), (1, [\"l\", \"l\", \"o\"]))"} |
384 \<open>(("h", "e"), (1, ["l", "l", "o"]))\<close>} |
385 |
385 |
386 \footnote{\bf FIXME: In which situations is \<open>lift\<close> useful? Give examples.} |
386 \footnote{\bf FIXME: In which situations is \<open>lift\<close> useful? Give examples.} |
387 |
387 |
388 Be aware of recursive parsers. Suppose you want to read strings separated by |
388 Be aware of recursive parsers. Suppose you want to read strings separated by |
389 commas and by parentheses into a tree datastructure; for example, generating |
389 commas and by parentheses into a tree datastructure; for example, generating |
567 see the complete list. |
566 see the complete list. |
568 |
567 |
569 The parser @{ML_ind "$$$" in Parse} parses a single keyword. For example: |
568 The parser @{ML_ind "$$$" in Parse} parses a single keyword. For example: |
570 |
569 |
571 @{ML_matchresult [display,gray] |
570 @{ML_matchresult [display,gray] |
572 "let |
571 \<open>let |
573 val input1 = filtered_input \"where for\" |
572 val input1 = filtered_input "where for" |
574 val input2 = filtered_input \"| in\" |
573 val input2 = filtered_input "| in" |
575 in |
574 in |
576 (Parse.$$$ \"where\" input1, Parse.$$$ \"|\" input2) |
575 (Parse.$$$ "where" input1, Parse.$$$ "|" input2) |
577 end" |
576 end\<close> |
578 "((\"where\",_), (\"|\",_))"} |
577 \<open>(("where",_), ("|",_))\<close>} |
579 |
578 |
580 Any non-keyword string can be parsed with the function @{ML_ind reserved in Parse}. |
579 Any non-keyword string can be parsed with the function @{ML_ind reserved in Parse}. |
581 For example: |
580 For example: |
582 |
581 |
583 @{ML_matchresult [display,gray] |
582 @{ML_matchresult [display,gray] |
584 "let |
583 \<open>let |
585 val p = Parse.reserved \"bar\" |
584 val p = Parse.reserved "bar" |
586 val input = filtered_input \"bar\" |
585 val input = filtered_input "bar" |
587 in |
586 in |
588 p input |
587 p input |
589 end" |
588 end\<close> |
590 "(\"bar\",[])"} |
589 \<open>("bar",[])\<close>} |
591 |
590 |
592 Like before, you can sequentially connect parsers with @{ML "--"}. For example: |
591 Like before, you can sequentially connect parsers with @{ML \<open>--\<close>}. For example: |
593 |
592 |
594 @{ML_matchresult [display,gray] |
593 @{ML_matchresult [display,gray] |
595 "let |
594 \<open>let |
596 val input = filtered_input \"| in\" |
595 val input = filtered_input "| in" |
597 in |
596 in |
598 (Parse.$$$ \"|\" -- Parse.$$$ \"in\") input |
597 (Parse.$$$ "|" -- Parse.$$$ "in") input |
599 end" |
598 end\<close> |
600 "((\"|\", \"in\"), [])"} |
599 \<open>(("|", "in"), [])\<close>} |
601 |
600 |
602 The parser @{ML "Parse.enum s p" for s p} parses a possibly empty |
601 The parser @{ML \<open>Parse.enum s p\<close> for s p} parses a possibly empty |
603 list of items recognised by the parser \<open>p\<close>, where the items being parsed |
602 list of items recognised by the parser \<open>p\<close>, where the items being parsed |
604 are separated by the string \<open>s\<close>. For example: |
603 are separated by the string \<open>s\<close>. For example: |
605 |
604 |
606 @{ML_matchresult [display,gray] |
605 @{ML_matchresult [display,gray] |
607 "let |
606 \<open>let |
608 val input = filtered_input \"in | in | in foo\" |
607 val input = filtered_input "in | in | in foo" |
609 in |
608 in |
610 (Parse.enum \"|\" (Parse.$$$ \"in\")) input |
609 (Parse.enum "|" (Parse.$$$ "in")) input |
611 end" |
610 end\<close> |
612 "([\"in\", \"in\", \"in\"], [_])"} |
611 \<open>(["in", "in", "in"], [_])\<close>} |
613 |
612 |
614 The function @{ML_ind enum1 in Parse} works similarly, except that the |
613 The function @{ML_ind enum1 in Parse} works similarly, except that the |
615 parsed list must be non-empty. Note that we had to add a string @{text |
614 parsed list must be non-empty. Note that we had to add a string @{text |
616 [quotes] "foo"} at the end of the parsed string, otherwise the parser would |
615 [quotes] "foo"} at the end of the parsed string, otherwise the parser would |
617 have consumed all tokens and then failed with the exception \<open>MORE\<close>. Like in the previous section, we can avoid this exception using the |
616 have consumed all tokens and then failed with the exception \<open>MORE\<close>. Like in the previous section, we can avoid this exception using the |
618 wrapper @{ML Scan.finite}. This time, however, we have to use the |
617 wrapper @{ML Scan.finite}. This time, however, we have to use the |
619 ``stopper-token'' @{ML Token.stopper}. We can write: |
618 ``stopper-token'' @{ML Token.stopper}. We can write: |
620 |
619 |
621 @{ML_matchresult [display,gray] |
620 @{ML_matchresult [display,gray] |
622 "let |
621 \<open>let |
623 val input = filtered_input \"in | in | in\" |
622 val input = filtered_input "in | in | in" |
624 val p = Parse.enum \"|\" (Parse.$$$ \"in\") |
623 val p = Parse.enum "|" (Parse.$$$ "in") |
625 in |
624 in |
626 Scan.finite Token.stopper p input |
625 Scan.finite Token.stopper p input |
627 end" |
626 end\<close> |
628 "([\"in\", \"in\", \"in\"], [])"} |
627 \<open>(["in", "in", "in"], [])\<close>} |
629 |
628 |
630 The following function will help to run examples. |
629 The following function will help to run examples. |
631 \<close> |
630 \<close> |
632 |
631 |
633 ML %grayML\<open>fun parse p input = Scan.finite Token.stopper (Scan.error p) input\<close> |
632 ML %grayML\<open>fun parse p input = Scan.finite Token.stopper (Scan.error p) input\<close> |
634 |
633 |
635 text \<open> |
634 text \<open> |
636 The function @{ML_ind "!!!" in Parse} can be used to force termination |
635 The function @{ML_ind "!!!" in Parse} can be used to force termination |
637 of the parser in case of a dead end, just like @{ML "Scan.!!"} (see previous |
636 of the parser in case of a dead end, just like @{ML \<open>Scan.!!\<close>} (see previous |
638 section). A difference, however, is that the error message of @{ML |
637 section). A difference, however, is that the error message of @{ML \<open>Parse.!!!\<close>} is fixed to be @{text [quotes] "Outer syntax error"} |
639 "Parse.!!!"} is fixed to be @{text [quotes] "Outer syntax error"} |
|
640 together with a relatively precise description of the failure. For example: |
638 together with a relatively precise description of the failure. For example: |
641 |
639 |
642 @{ML_matchresult_fake [display,gray] |
640 @{ML_matchresult_fake [display,gray] |
643 "let |
641 \<open>let |
644 val input = filtered_input \"in |\" |
642 val input = filtered_input "in |" |
645 val parse_bar_then_in = Parse.$$$ \"|\" -- Parse.$$$ \"in\" |
643 val parse_bar_then_in = Parse.$$$ "|" -- Parse.$$$ "in" |
646 in |
644 in |
647 parse (Parse.!!! parse_bar_then_in) input |
645 parse (Parse.!!! parse_bar_then_in) input |
648 end" |
646 end\<close> |
649 "Exception ERROR \"Outer syntax error: keyword \"|\" expected, |
647 \<open>Exception ERROR "Outer syntax error: keyword "|" expected, |
650 but keyword in was found\" raised" |
648 but keyword in was found" raised\<close> |
651 } |
649 } |
652 |
650 |
653 \begin{exercise} (FIXME) |
651 \begin{exercise} (FIXME) |
654 A type-identifier, for example @{typ "'a"}, is a token of |
652 A type-identifier, for example @{typ "'a"}, is a token of |
655 kind @{ML_ind Keyword in Token}. It can be parsed using |
653 kind @{ML_ind Keyword in Token}. It can be parsed using |