41 |
40 |
42 @{ML_response [display] "($$ \"h\") (explode \"hello\")" "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"} |
41 @{ML_response [display] "($$ \"h\") (explode \"hello\")" "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"} |
43 @{ML_response [display] "($$ \"w\") (explode \"world\")" "(\"w\", [\"o\", \"r\", \"l\", \"d\"])"} |
42 @{ML_response [display] "($$ \"w\") (explode \"world\")" "(\"w\", [\"o\", \"r\", \"l\", \"d\"])"} |
44 |
43 |
45 This function will either succeed (as in the two examples above) or raise the exeption |
44 This function will either succeed (as in the two examples above) or raise the exeption |
46 @{ML_text "FAIL"} if no string can be consumed. For example in the next example |
45 @{ML_text "FAIL"} if no string can be consumed. For example trying to parse |
47 try to parse: |
46 |
48 |
47 @{ML_response_fake [display] "($$ \"x\") (explode \"world\")" |
49 @{ML_text [display] "($$ \"x\") (explode \"world\")"} |
48 "Exception FAIL raised"} |
50 |
49 |
|
50 will raise the exception @{ML_text "FAIL"}. |
51 There are three exceptions used in the parsing combinators: |
51 There are three exceptions used in the parsing combinators: |
52 |
52 |
53 (FIXME: describe exceptions) |
|
54 |
|
55 \begin{itemize} |
53 \begin{itemize} |
56 \item @{ML_text "FAIL"} |
54 \item @{ML_text "FAIL"} is used to indicate that alternative routes of parsing |
57 \item @{ML_text "MORE"} @{ML_text "($$ \"h\") []"} |
55 might be explored. |
58 \item @{ML_text "ABORT"} dead end |
56 \item @{ML_text "MORE"} indicates that there is not enough input for the parser. For example |
|
57 in @{ML_text "($$ \"h\") []"}. |
|
58 \item @{ML_text "ABORT"} is the exception which is raised when a dead end is reached. |
|
59 It is used for example in the function @{ML "(op !!)"} (see below). |
59 \end{itemize} |
60 \end{itemize} |
60 |
61 |
61 Slightly more general than @{ML "(op $$)"} is the function @{ML Scan.one} in that it |
62 |
|
63 Slightly more general than the parser @{ML "(op $$)"} is the function @{ML Scan.one}, in that it |
62 takes a predicate as argument and then parses exactly one item from the input list |
64 takes a predicate as argument and then parses exactly one item from the input list |
63 satisfying this prediate. For example the following parser either consumes an @{ML_text "h"} |
65 satisfying this prediate. For example the following parser either consumes an @{ML_text "h"} |
64 or a @{ML_text "w"}: |
66 or a @{ML_text "w"}: |
65 |
67 |
66 @{ML_response [display] |
68 @{ML_response [display] |
72 (hw input1, hw input2) |
74 (hw input1, hw input2) |
73 end" |
75 end" |
74 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} |
76 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} |
75 |
77 |
76 Two parser can be connected in sequence by using the funtion @{ML "(op --)"}. |
78 Two parser can be connected in sequence by using the funtion @{ML "(op --)"}. |
77 For example |
79 For example parsing @{ML_text "h"}, @{ML_text "e"} and @{ML_text "l"} in this |
|
80 sequence can be achieved by |
78 |
81 |
79 @{ML_response [display] "(($$ \"h\") -- ($$ \"e\") -- ($$ \"l\")) (explode \"hello\")" |
82 @{ML_response [display] "(($$ \"h\") -- ($$ \"e\") -- ($$ \"l\")) (explode \"hello\")" |
80 "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"} |
83 "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"} |
81 |
84 |
82 Note how the result of consumed strings builds up on the left as nested pairs. |
85 Note how the result of consumed strings builds up on the left as nested pairs. |
83 |
86 |
84 Parsers that explore |
87 Parsers that explore |
85 alternatives can be constructed using the function @{ML "(op ||)"}. For example, the |
88 alternatives can be constructed using the function @{ML "(op ||)"}. For example, the |
86 parser @{ML_open "p || q" for p q} returns the result of @{ML_text "p"}, if it succeeds, |
89 parser @{ML_open "(p || q)" for p q} returns the result of @{ML_text "p"}, if it succeeds, |
87 otherwise it returns the result of @{ML_text "q"}. For example |
90 otherwise it returns the result of @{ML_text "q"}. For example |
88 |
91 |
89 @{ML_response [display] |
92 @{ML_response [display] |
90 "let |
93 "let |
91 val hw = ($$ \"h\") || ($$ \"w\") |
94 val hw = ($$ \"h\") || ($$ \"w\") |
129 followed by @{ML_text q}, or start a completely different parser @{ML_text r}, |
132 followed by @{ML_text q}, or start a completely different parser @{ML_text r}, |
130 one might write |
133 one might write |
131 |
134 |
132 @{ML_open [display] "(p -- q) || r" for p q r} |
135 @{ML_open [display] "(p -- q) || r" for p q r} |
133 |
136 |
134 However, this way is problematic for producing an appropriate error message, in case |
137 However, this parser is problematic for producing an appropriate error message, in case |
135 the parsing of @{ML_open "(p -- q)" for p q} fails. Because one loses the information |
138 the parsing of @{ML_open "(p -- q)" for p q} fails. Because one loses the information |
136 that @{ML_text p} should be followed by @{ML_text q}. To see this consider the case that @{ML_text p} |
139 that @{ML_text p} should be followed by @{ML_text q}. To see this consider the case that @{ML_text p} |
137 is present in the input, but not @{ML_text q}. So @{ML_open "(p -- q)" for p q} will fail and the |
140 is present in the input, but not @{ML_text q}. So @{ML_open "(p -- q)" for p q} will fail and the |
138 alternative parser @{ML_text r} will be tried. In many circumstances this will be the wrong |
141 alternative parser @{ML_text r} will be tried. However in many circumstanes this will be the wrong |
139 parser for the input and therefore probably fail. However, the error message is then caused by the |
142 parser for the input and therefore will fail. The error message is then caused by the |
140 failure of @{ML_text r}, not by the absense of @{ML_text p} in the input. These situations |
143 failure of @{ML_text r}, not by the absense of @{ML_text p} in the input. These situations |
141 can be avoided using the funtion @{ML "(op !!)"}, which aborts the whole process of |
144 can be avoided using the funtion @{ML "(op !!)"}. This function aborts the whole process of |
142 parsing and invokes an error message. For example if we invoke the parser |
145 parsing in case of failure and invokes an error message. For example if we invoke the parser |
143 |
146 |
144 @{ML [display] "(!! (fn _ => \"foo\") ($$ \"h\"))"} |
147 @{ML [display] "(!! (fn _ => \"foo\") ($$ \"h\"))"} |
145 |
148 |
146 on @{ML_text "hello"}, the parsing succeeds |
149 on @{ML_text "hello"}, the parsing succeeds |
147 |
150 |
148 @{ML_response [display] |
151 @{ML_response [display] |
149 "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"hello\")" |
152 "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"hello\")" |
150 "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"} |
153 "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"} |
151 |
154 |
152 In contrast if we invoke it on @{ML_text "world"} |
155 but if we invoke it on @{ML_text "world"} |
153 |
156 |
154 @{ML [display] "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"world\")"} |
157 @{ML_response_fake [display] "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"world\")" |
|
158 "Exception ABORT raised"} |
155 |
159 |
156 the parsing aborts and the error message @{ML_text "foo"} is printed out. In order to |
160 the parsing aborts and the error message @{ML_text "foo"} is printed out. In order to |
157 see the error message properly, we need to prefix the parser with the function |
161 see the error message properly, we need to prefix the parser with the function |
158 @{ML "Scan.error"}. For example |
162 @{ML "Scan.error"}. For example |
159 |
163 |
160 @{ML [display] "Scan.error ((!! (fn _ => \"foo\") ($$ \"h\")))"} |
164 @{ML_response_fake [display] "Scan.error ((!! (fn _ => \"foo\") ($$ \"h\")))" |
|
165 "Exception Error \"foo\" raised"} |
161 |
166 |
162 This ``prefixing'' is usually done by wrappers such as @{ML "OuterSyntax.command"} |
167 This ``prefixing'' is usually done by wrappers such as @{ML "OuterSyntax.command"} |
163 (FIXME: see below). |
168 (FIXME: give reference to late). |
164 |
169 |
165 Lets return to our example of parsing @{ML_open "(p -- q) || r" for p q r}. If we want |
170 Returning to our example of parsing @{ML_open "(p -- q) || r" for p q r}. If we want |
166 to generate the correct error message for @{ML_text q} not following @{ML_text p}, then |
171 to generate the correct error message for @{ML_text q} not following @{ML_text p}, then |
167 we have to write |
172 we have to write |
168 *} |
173 *} |
169 |
174 |
170 ML {* |
175 ML {* |
171 fun p_followed_by_q p q r = |
176 fun p_followed_by_q p q r = |
172 let |
177 let |
173 val err = (fn _ => p ^ " is not followed by " ^ q) |
178 val err = (fn _ => p ^ " is not followed by " ^ q) |
174 in |
179 in |
175 (($$ p) -- (!! err ($$ q))) || (($$ r) -- ($$ r)) |
180 (($$ p) -- (!! err ($$ q))) || (($$ r) -- ($$ r)) |
176 end |
181 end |
177 *} |
182 *} |
178 |
183 |
|
184 |
179 text {* |
185 text {* |
180 Running this parser with |
186 Running this parser with |
181 |
187 |
182 @{ML_text [display] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"holle\")"} |
188 @{ML_response_fake [display] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"holle\")" |
183 |
189 "Exception ERROR \"h is not followed by e\" raised"} |
184 gives the correct error message and running it with |
190 |
|
191 gives the correct error message. Running it with |
185 |
192 |
186 @{ML_response [display] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"wworld\")" |
193 @{ML_response [display] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"wworld\")" |
187 "((\"w\", \"w\"), [\"o\", \"r\", \"l\", \"d\"])"} |
194 "((\"w\", \"w\"), [\"o\", \"r\", \"l\", \"d\"])"} |
188 |
195 |
189 yields the expected parsing. |
196 yields the expected parsing. |
190 |
197 |
191 The function @{ML "Scan.repeat"} will apply a parser as often as it succeeds. For examle |
198 The function @{ML_open "Scan.repeat p" for p} will apply a parser @{ML_text p} as |
192 |
199 often as it succeeds. For example |
193 @{ML_response "Scan.repeat ($$ \"h\") (explode \"hhhhello\")" |
200 |
|
201 @{ML_response [display] "Scan.repeat ($$ \"h\") (explode \"hhhhello\")" |
194 "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"} |
202 "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"} |
195 |
203 |
196 Note that @{ML "Scan.repeat"} stores the parsed items in a list. The function |
204 Note that @{ML "Scan.repeat"} stores the parsed items in a list. The function |
197 @{ML "Scan.repeat1"} is similar, but requires that in @{ML_open "Scan.repeat1 p" for p} |
205 @{ML "Scan.repeat1"} is similar, but requires that the parser @{ML_text "p"} |
198 the parse @{ML_text "p"} succeeds at least once. |
206 succeeds at least once. |
199 *} |
207 *} |
200 |
208 |
201 text {* |
209 text {* |
202 After parsing succeeded, one wants to apply functions on the parsed items. This is |
210 After parsing succeeded, one nearly always wants to apply a function on the parsed |
203 done using the function @{ML_open "(p >> f)" for p f} which applies first the |
211 items. This is done using the function @{ML_open "(p >> f)" for p f} which runs |
204 parser @{ML_text p} upon successful completion applies the function @{ML_text f}. |
212 first the parser @{ML_text p} and upon successful completion applies the |
205 For example |
213 function @{ML_text f} to the result. For example |
206 |
214 |
207 @{ML_response [display] |
215 @{ML_response [display] |
208 "let |
216 "let |
209 fun double (x,y) = (x^x,y^y) |
217 fun double (x,y) = (x^x,y^y) |
210 in |
218 in |
211 (($$ \"h\") -- ($$ \"e\") >> double) (explode \"hello\") |
219 (($$ \"h\") -- ($$ \"e\") >> double) (explode \"hello\") |
212 end" |
220 end" |
213 "((\"hh\", \"ee\"), [\"l\", \"l\", \"o\"])"} |
221 "((\"hh\", \"ee\"), [\"l\", \"l\", \"o\"])"} |
214 |
222 |
|
223 doubles the two parsed input strings. |
|
224 |
215 The function @{ML Scan.lift} takes a parser and a pair as arguments. This function applies |
225 The function @{ML Scan.lift} takes a parser and a pair as arguments. This function applies |
216 the given parser to the second component of the pair and leaves the first component |
226 the given parser to the second component of the pair and leaves the first component |
217 untouched. For example |
227 untouched. For example |
218 |
228 |
219 @{ML_response [display] |
229 @{ML_response [display] |
221 "((\"h\", \"e\"), (1, [\"l\", \"l\", \"o\"]))"} |
231 "((\"h\", \"e\"), (1, [\"l\", \"l\", \"o\"]))"} |
222 |
232 |
223 (FIXME: In which situations is this useful?) |
233 (FIXME: In which situations is this useful?) |
224 *} |
234 *} |
225 |
235 |
226 section {* Parsing Tokens *} |
236 section {* Parsing Theory Syntax *} |
227 |
237 |
228 text {* |
238 text {* |
229 Most of the time, however, we will have to deal with tokens that are not just strings. |
239 Most of the time, however, Isabelle developers have to deal with parsing tokens, not strings. |
230 The parsers for the theory syntax, as well as the parsers for the argument syntax |
240 This is because the parsers for the theory syntax, as well as the parsers for the |
231 of proof methods and attributes use the token type @{ML_type OuterParse.token}, |
241 argument syntax of proof methods and attributes, use the type |
232 which is identical to @{ML_type OuterLex.token}. |
242 @{ML_type OuterParse.token} (which is identical to the type @{ML_type OuterLex.token}). |
233 The parser functions for the theory syntax are contained in the structure |
243 The parser functions for the theory syntax are contained in the structure |
234 @{ML_struct OuterParse} defined in the file @{ML_file "Pure/Isar/outer_parse.ML"}. |
244 @{ML_struct OuterParse} defined in the file\linebreak @{ML_file "Pure/Isar/outer_parse.ML"}. |
235 *} |
245 *} |
|
246 |
|
247 ML {* map OuterLex.mk_text (explode "hello") *} |
|
248 |
|
249 text {* |
|
250 |
|
251 @{ML_response_fake [display] "map OuterLex.mk_text (explode \"hello\")" |
|
252 "[Token (\<dots>), Token (\<dots>), Token (\<dots>), Token (\<dots>), Token (\<dots>)]"} |
|
253 |
|
254 *} |
|
255 |
|
256 ML {* |
|
257 OuterLex.mk_text "hello" |
|
258 *} |
|
259 |
|
260 ML {* |
|
261 |
|
262 let val input = [OuterLex.mk_text "hello", OuterLex.mk_text "world"]; |
|
263 in (Scan.one (fn _ => true)) input end |
|
264 |
|
265 *} |
|
266 |
236 |
267 |
237 |
268 |
238 chapter {* Parsing *} |
269 chapter {* Parsing *} |
239 |
270 |
240 text {* |
271 text {* |