13 as terms, types and so on, belong to the inner syntax. For parsing inner syntax, |
14 as terms, types and so on, belong to the inner syntax. For parsing inner syntax, |
14 Isabelle uses a rather general and sophisticated algorithm due to Earley, which |
15 Isabelle uses a rather general and sophisticated algorithm due to Earley, which |
15 is driven by priority grammars. Parsers for outer syntax are built up by functional |
16 is driven by priority grammars. Parsers for outer syntax are built up by functional |
16 parsing combinators. These combinators are a well-established technique for parsing, |
17 parsing combinators. These combinators are a well-established technique for parsing, |
17 which has, for example, been described in Paulson's classic book \cite{paulson-ml2}. |
18 which has, for example, been described in Paulson's classic book \cite{paulson-ml2}. |
18 |
19 Isabelle developers are usually concerned with writing these outer syntax parsers, |
19 Isabelle developers are usually concerned with writing parsers for outer |
20 either for new definitional packages or for calling tactics with specific arguments. |
20 syntax, either for new definitional packages or for calling tactics. The library |
21 |
21 for writing such parsers in can roughly be split up into two parts. |
22 \begin{readmore} |
|
23 The library |
|
24 for writing parser combinators can be split up, roughly, into two parts. |
22 The first part consists of a collection of generic parser combinators defined |
25 The first part consists of a collection of generic parser combinators defined |
23 in the structure @{ML_struct Scan} in the file |
26 in the structure @{ML_struct Scan} in the file |
24 @{ML_file "Pure/General/scan.ML"}. The second part of the library consists of |
27 @{ML_file "Pure/General/scan.ML"}. The second part of the library consists of |
25 combinators for dealing with specific token types, which are defined in the |
28 combinators for dealing with specific token types, which are defined in the |
26 structure @{ML_struct OuterParse} in the file |
29 structure @{ML_struct OuterParse} in the file |
27 @{ML_file "Pure/Isar/outer_parse.ML"}. |
30 @{ML_file "Pure/Isar/outer_parse.ML"}. |
|
31 \end{readmore} |
28 |
32 |
29 *} |
33 *} |
30 |
34 |
31 section {* Building Up Generic Parsers *} |
35 section {* Building Up Generic Parsers *} |
32 |
36 |
47 @{ML_response_fake [display] "($$ \"x\") (explode \"world\")" |
51 @{ML_response_fake [display] "($$ \"x\") (explode \"world\")" |
48 "Exception FAIL raised"} |
52 "Exception FAIL raised"} |
49 |
53 |
50 will raise the exception @{ML_text "FAIL"}. |
54 will raise the exception @{ML_text "FAIL"}. |
51 There are three exceptions used in the parsing combinators: |
55 There are three exceptions used in the parsing combinators: |
|
56 |
|
57 (FIXME: do the exception need to be explained, because the user cannot use them from ``outside''?) |
52 |
58 |
53 \begin{itemize} |
59 \begin{itemize} |
54 \item @{ML_text "FAIL"} is used to indicate that alternative routes of parsing |
60 \item @{ML_text "FAIL"} is used to indicate that alternative routes of parsing |
55 might be explored. |
61 might be explored. |
56 \item @{ML_text "MORE"} indicates that there is not enough input for the parser. For example |
62 \item @{ML_text "MORE"} indicates that there is not enough input for the parser. For example |
84 |
90 |
85 Note how the result of consumed strings builds up on the left as nested pairs. |
91 Note how the result of consumed strings builds up on the left as nested pairs. |
86 |
92 |
87 Parsers that explore |
93 Parsers that explore |
88 alternatives can be constructed using the function @{ML "(op ||)"}. For example, the |
94 alternatives can be constructed using the function @{ML "(op ||)"}. For example, the |
89 parser @{ML_open "(p || q)" for p q} returns the result of @{ML_text "p"}, if it succeeds, |
95 parser @{ML_open "(p || q)" for p q} returns the result of @{ML_text "p"}, in case it succeeds, |
90 otherwise it returns the result of @{ML_text "q"}. For example |
96 otherwise it returns the result of @{ML_text "q"}. For example |
91 |
97 |
92 @{ML_response [display] |
98 @{ML_response [display] |
93 "let |
99 "let |
94 val hw = ($$ \"h\") || ($$ \"w\") |
100 val hw = ($$ \"h\") || ($$ \"w\") |
133 one might write |
139 one might write |
134 |
140 |
135 @{ML_open [display] "(p -- q) || r" for p q r} |
141 @{ML_open [display] "(p -- q) || r" for p q r} |
136 |
142 |
137 However, this parser is problematic for producing an appropriate error message, in case |
143 However, this parser is problematic for producing an appropriate error message, in case |
138 the parsing of @{ML_open "(p -- q)" for p q} fails. Because one loses the information |
144 the parsing of @{ML_open "(p -- q)" for p q} fails. Because in that case one loses with the parser |
139 that @{ML_text p} should be followed by @{ML_text q}. To see this consider the case that @{ML_text p} |
145 above the information |
140 is present in the input, but not @{ML_text q}. So @{ML_open "(p -- q)" for p q} will fail and the |
146 that @{ML_text p} should be followed by @{ML_text q}. To see this consider the case in |
|
147 which @{ML_text p} |
|
148 is present in the input, but not @{ML_text q}. That means @{ML_open "(p -- q)" for p q} will fail |
|
149 and the |
141 alternative parser @{ML_text r} will be tried. However in many circumstanes this will be the wrong |
150 alternative parser @{ML_text r} will be tried. However in many circumstanes this will be the wrong |
142 parser for the input and therefore will fail. The error message is then caused by the |
151 parser for the input ``p-followed-by-q'' and therefore will fail. The error message is then caused by the |
143 failure of @{ML_text r}, not by the absense of @{ML_text p} in the input. These situations |
152 failure of @{ML_text r}, not by the absense of @{ML_text q} in the input. This kind of situation |
144 can be avoided using the funtion @{ML "(op !!)"}. This function aborts the whole process of |
153 can be avoided by using the funtion @{ML "(op !!)"}. This function aborts the whole process of |
145 parsing in case of failure and invokes an error message. For example if we invoke the parser |
154 parsing in case of failure and invokes an error message. For example if we invoke the parser |
146 |
155 |
147 @{ML [display] "(!! (fn _ => \"foo\") ($$ \"h\"))"} |
156 @{ML [display] "(!! (fn _ => \"foo\") ($$ \"h\"))"} |
148 |
157 |
149 on @{ML_text "hello"}, the parsing succeeds |
158 on @{ML_text [quotes] "hello"}, the parsing succeeds |
150 |
159 |
151 @{ML_response [display] |
160 @{ML_response [display] |
152 "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"hello\")" |
161 "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"hello\")" |
153 "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"} |
162 "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"} |
154 |
163 |
155 but if we invoke it on @{ML_text "world"} |
164 but if we invoke it on @{ML_text [quotes] "world"} |
156 |
165 |
157 @{ML_response_fake [display] "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"world\")" |
166 @{ML_response_fake [display] "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"world\")" |
158 "Exception ABORT raised"} |
167 "Exception ABORT raised"} |
159 |
168 |
160 the parsing aborts and the error message @{ML_text "foo"} is printed out. In order to |
169 the parsing aborts and the error message @{ML_text "foo"} is printed out. In order to |
163 |
172 |
164 @{ML_response_fake [display] "Scan.error ((!! (fn _ => \"foo\") ($$ \"h\")))" |
173 @{ML_response_fake [display] "Scan.error ((!! (fn _ => \"foo\") ($$ \"h\")))" |
165 "Exception Error \"foo\" raised"} |
174 "Exception Error \"foo\" raised"} |
166 |
175 |
167 This ``prefixing'' is usually done by wrappers such as @{ML "OuterSyntax.command"} |
176 This ``prefixing'' is usually done by wrappers such as @{ML "OuterSyntax.command"} |
168 (FIXME: give reference to late). |
177 (FIXME: give reference to later place). |
169 |
178 |
170 Returning to our example of parsing @{ML_open "(p -- q) || r" for p q r}. If we want |
179 Returning to our example of parsing @{ML_open "(p -- q) || r" for p q r}. If we want |
171 to generate the correct error message for @{ML_text q} not following @{ML_text p}, then |
180 to generate the correct error message for p-followed-by-q, then |
172 we have to write |
181 we have to write |
173 *} |
182 *} |
174 |
183 |
175 ML {* |
184 ML {* |
176 fun p_followed_by_q p q r = |
185 fun p_followed_by_q p q r = |
177 let |
186 let |
178 val err = (fn _ => p ^ " is not followed by " ^ q) |
187 val err = (fn _ => p ^ " is not followed by " ^ q) |
179 in |
188 in |
180 (($$ p) -- (!! err ($$ q))) || (($$ r) -- ($$ r)) |
189 (($$ p) -- (!! err ($$ q))) || (($$ r) -- ($$ r)) |
238 text {* |
247 text {* |
239 Most of the time, however, Isabelle developers have to deal with parsing tokens, not strings. |
248 Most of the time, however, Isabelle developers have to deal with parsing tokens, not strings. |
240 This is because the parsers for the theory syntax, as well as the parsers for the |
249 This is because the parsers for the theory syntax, as well as the parsers for the |
241 argument syntax of proof methods and attributes, use the type |
250 argument syntax of proof methods and attributes, use the type |
242 @{ML_type OuterParse.token} (which is identical to the type @{ML_type OuterLex.token}). |
251 @{ML_type OuterParse.token} (which is identical to the type @{ML_type OuterLex.token}). |
|
252 |
|
253 \begin{readmore} |
243 The parser functions for the theory syntax are contained in the structure |
254 The parser functions for the theory syntax are contained in the structure |
244 @{ML_struct OuterParse} defined in the file\linebreak @{ML_file "Pure/Isar/outer_parse.ML"}. |
255 @{ML_struct OuterParse} defined in the file @{ML_file "Pure/Isar/outer_parse.ML"}. |
|
256 The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}. |
|
257 \end{readmore} |
|
258 *} |
|
259 |
|
260 ML {* OuterLex.mk_text "hello" *} |
|
261 |
|
262 ML {* print_depth 50 *} |
|
263 |
|
264 ML {* |
|
265 let open OuterLex in |
|
266 OuterSyntax.scan Position.none "for" |
|
267 end; |
|
268 |
245 *} |
269 *} |
246 |
270 |
247 ML {* map OuterLex.mk_text (explode "hello") *} |
271 ML {* map OuterLex.mk_text (explode "hello") *} |
|
272 |
|
273 ML {* |
|
274 |
|
275 fun my_scan lex pos str = |
|
276 Source.of_string str |
|
277 |> Symbol.source {do_recover = false} |
|
278 |> OuterLex.source {do_recover = SOME false} |
|
279 (fn () => pairself (Scan.make_lexicon o map Symbol.explode) lex) pos |
|
280 |> Source.exhaust; |
|
281 |
|
282 *} |
|
283 |
|
284 ML {* |
|
285 let val toks = my_scan (["hello"], []) Position.none "hello" |
|
286 in (OuterParse.$$$ "hello") toks end |
|
287 *} |
248 |
288 |
249 text {* |
289 text {* |
250 |
290 |
251 @{ML_response_fake [display] "map OuterLex.mk_text (explode \"hello\")" |
291 @{ML_response_fake [display] "map OuterLex.mk_text (explode \"hello\")" |
252 "[Token (\<dots>), Token (\<dots>), Token (\<dots>), Token (\<dots>), Token (\<dots>)]"} |
292 "[Token (\<dots>), Token (\<dots>), Token (\<dots>), Token (\<dots>), Token (\<dots>)]"} |