5 chapter {* Parsing *} |
5 chapter {* Parsing *} |
6 |
6 |
7 text {* |
7 text {* |
8 |
8 |
9 Isabelle distinguishes between \emph{outer} and \emph{inner} syntax. |
9 Isabelle distinguishes between \emph{outer} and \emph{inner} syntax. |
10 Theory commands, such as \isacommand{definition}, \isacommand{inductive} and so |
10 Commands, such as \isacommand{definition}, \isacommand{inductive} and so |
11 on, belong to the outer syntax, whereas items inside double quotation marks, such |
11 on, belong to the outer syntax, whereas terms, types and so on belong to the |
12 as terms, types and so on, belong to the inner syntax. For parsing inner syntax, |
12 inner syntax. For parsing inner syntax, |
13 Isabelle uses a rather general and sophisticated algorithm, which |
13 Isabelle uses a rather general and sophisticated algorithm, which |
14 is driven by priority grammars. Parsers for outer syntax are built up by functional |
14 is driven by priority grammars. Parsers for outer syntax are built up by functional |
15 parsing combinators. These combinators are a well-established technique for parsing, |
15 parsing combinators. These combinators are a well-established technique for parsing, |
16 which has, for example, been described in Paulson's classic ML-book \cite{paulson-ml2}. |
16 which has, for example, been described in Paulson's classic ML-book \cite{paulson-ml2}. |
17 Isabelle developers are usually concerned with writing these outer syntax parsers, |
17 Isabelle developers are usually concerned with writing these outer syntax parsers, |
18 either for new definitional packages or for calling methods with specific arguments. |
18 either for new definitional packages or for calling methods with specific arguments. |
19 |
19 |
20 \begin{readmore} |
20 \begin{readmore} |
21 The library |
21 The library for writing parser combinators is split up, roughly, into two |
22 for writing parser combinators is split up, roughly, into two parts. |
22 parts. The first part consists of a collection of generic parser combinators |
23 The first part consists of a collection of generic parser combinators defined |
23 defined in the structure @{ML_struct Scan} in the file @{ML_file |
24 in the structure @{ML_struct Scan} in the file |
24 "Pure/General/scan.ML"}. The second part of the library consists of |
25 @{ML_file "Pure/General/scan.ML"}. The second part of the library consists of |
25 combinators for dealing with specific token types, which are defined in the |
26 combinators for dealing with specific token types, which are defined in the |
26 structure @{ML_struct OuterParse} in the file @{ML_file |
27 structure @{ML_struct OuterParse} in the file |
27 "Pure/Isar/outer_parse.ML"}. Specific parsers for packages are defined |
28 @{ML_file "Pure/Isar/outer_parse.ML"}. |
28 in @{ML_file "Pure/Isar/spec_parse.ML"}. Parsers for method arguments |
|
29 are defined in @{ML_file "Pure/Isar/args.ML"}. |
29 \end{readmore} |
30 \end{readmore} |
30 |
31 |
31 *} |
32 *} |
32 |
33 |
33 section {* Building Generic Parsers *} |
34 section {* Building Generic Parsers *} |
73 |
74 |
74 |
75 |
75 @{ML_response [display,gray] |
76 @{ML_response [display,gray] |
76 "let |
77 "let |
77 val hw = Scan.one (fn x => x = \"h\" orelse x = \"w\") |
78 val hw = Scan.one (fn x => x = \"h\" orelse x = \"w\") |
78 val input1 = (explode \"hello\") |
79 val input1 = explode \"hello\" |
79 val input2 = (explode \"world\") |
80 val input2 = explode \"world\" |
80 in |
81 in |
81 (hw input1, hw input2) |
82 (hw input1, hw input2) |
82 end" |
83 end" |
83 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} |
84 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} |
84 |
85 |
85 Two parsers can be connected in sequence by using the function @{ML "--"}. |
86 Two parsers can be connected in sequence by using the function @{ML "--"}. |
86 For example parsing @{text "h"}, @{text "e"} and @{text "l"} (in this |
87 For example parsing @{text "h"}, @{text "e"} and @{text "l"} (in this |
87 order) you can achieve by: |
88 order) you can achieve by: |
88 |
89 |
89 @{ML_response [display,gray] "(($$ \"h\") -- ($$ \"e\") -- ($$ \"l\")) (explode \"hello\")" |
90 @{ML_response [display,gray] |
90 "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"} |
91 "($$ \"h\" -- $$ \"e\" -- $$ \"l\") (explode \"hello\")" |
|
92 "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"} |
91 |
93 |
92 Note how the result of consumed strings builds up on the left as nested pairs. |
94 Note how the result of consumed strings builds up on the left as nested pairs. |
93 |
95 |
94 If, as in the previous example, you want to parse a particular string, |
96 If, as in the previous example, you want to parse a particular string, |
95 then you should use the function @{ML Scan.this_string}: |
97 then you should use the function @{ML Scan.this_string}: |
96 |
98 |
97 @{ML_response [display,gray] "Scan.this_string \"hell\" (explode \"hello\")" |
99 @{ML_response [display,gray] |
98 "(\"hell\", [\"o\"])"} |
100 "Scan.this_string \"hell\" (explode \"hello\")" |
|
101 "(\"hell\", [\"o\"])"} |
99 |
102 |
100 Parsers that explore alternatives can be constructed using the function @{ML |
103 Parsers that explore alternatives can be constructed using the function @{ML |
101 "||"}. The parser @{ML "(p || q)" for p q} returns the |
104 "||"}. The parser @{ML "(p || q)" for p q} returns the |
102 result of @{text "p"}, in case it succeeds, otherwise it returns the |
105 result of @{text "p"}, in case it succeeds, otherwise it returns the |
103 result of @{text "q"}. For example: |
106 result of @{text "q"}. For example: |
104 |
107 |
105 |
108 |
106 @{ML_response [display,gray] |
109 @{ML_response [display,gray] |
107 "let |
110 "let |
108 val hw = ($$ \"h\") || ($$ \"w\") |
111 val hw = $$ \"h\" || $$ \"w\" |
109 val input1 = (explode \"hello\") |
112 val input1 = explode \"hello\" |
110 val input2 = (explode \"world\") |
113 val input2 = explode \"world\" |
111 in |
114 in |
112 (hw input1, hw input2) |
115 (hw input1, hw input2) |
113 end" |
116 end" |
114 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} |
117 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} |
115 |
118 |
117 for parsers, except that they discard the item being parsed by the first (respectively second) |
120 for parsers, except that they discard the item being parsed by the first (respectively second) |
118 parser. For example: |
121 parser. For example: |
119 |
122 |
120 @{ML_response [display,gray] |
123 @{ML_response [display,gray] |
121 "let |
124 "let |
122 val just_e = ($$ \"h\") |-- ($$ \"e\") |
125 val just_e = $$ \"h\" |-- $$ \"e\" |
123 val just_h = ($$ \"h\") --| ($$ \"e\") |
126 val just_h = $$ \"h\" --| $$ \"e\" |
124 val input = (explode \"hello\") |
127 val input = explode \"hello\" |
125 in |
128 in |
126 (just_e input, just_h input) |
129 (just_e input, just_h input) |
127 end" |
130 end" |
128 "((\"e\", [\"l\", \"l\", \"o\"]),(\"h\", [\"l\", \"l\", \"o\"]))"} |
131 "((\"e\", [\"l\", \"l\", \"o\"]),(\"h\", [\"l\", \"l\", \"o\"]))"} |
129 |
132 |
132 the default value @{text "x"}. For example: |
135 the default value @{text "x"}. For example: |
133 |
136 |
134 @{ML_response [display,gray] |
137 @{ML_response [display,gray] |
135 "let |
138 "let |
136 val p = Scan.optional ($$ \"h\") \"x\" |
139 val p = Scan.optional ($$ \"h\") \"x\" |
137 val input1 = (explode \"hello\") |
140 val input1 = explode \"hello\" |
138 val input2 = (explode \"world\") |
141 val input2 = explode \"world\" |
139 in |
142 in |
140 (p input1, p input2) |
143 (p input1, p input2) |
141 end" |
144 end" |
142 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} |
145 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} |
143 |
146 |
145 be given. Instead, the result is wrapped as an @{text "option"}-type. For example: |
148 be given. Instead, the result is wrapped as an @{text "option"}-type. For example: |
146 |
149 |
147 @{ML_response [display,gray] |
150 @{ML_response [display,gray] |
148 "let |
151 "let |
149 val p = Scan.option ($$ \"h\") |
152 val p = Scan.option ($$ \"h\") |
150 val input1 = (explode \"hello\") |
153 val input1 = explode \"hello\" |
151 val input2 = (explode \"world\") |
154 val input2 = explode \"world\" |
152 in |
155 in |
153 (p input1, p input2) |
156 (p input1, p input2) |
154 end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} |
157 end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} |
155 |
158 |
156 The function @{ML "!!"} helps to produce appropriate error messages |
159 The function @{ML "!!"} helps to produce appropriate error messages |
164 message, if the parsing of @{ML "(p -- q)" for p q} fails. Because in that |
167 message, if the parsing of @{ML "(p -- q)" for p q} fails. Because in that |
165 case you lose the information that @{text p} should be followed by @{text q}. |
168 case you lose the information that @{text p} should be followed by @{text q}. |
166 To see this assume that @{text p} is present in the input, but it is not |
169 To see this assume that @{text p} is present in the input, but it is not |
167 followed by @{text q}. That means @{ML "(p -- q)" for p q} will fail and |
170 followed by @{text q}. That means @{ML "(p -- q)" for p q} will fail and |
168 hence the alternative parser @{text r} will be tried. However, in many |
171 hence the alternative parser @{text r} will be tried. However, in many |
169 circumstances this will be the wrong parser for the input ``p-followed-by-something'' |
172 circumstances this will be the wrong parser for the input ``@{text "p"}-followed-by-something'' |
170 and therefore will also fail. The error message is then caused by the failure |
173 and therefore will also fail. The error message is then caused by the failure |
171 of @{text r}, not by the absence of @{text q} in the input. This kind of |
174 of @{text r}, not by the absence of @{text q} in the input. This kind of |
172 situation can be avoided when using the function @{ML "!!"}. This function |
175 situation can be avoided when using the function @{ML "!!"}. This function |
173 aborts the whole process of parsing in case of a failure and prints an error |
176 aborts the whole process of parsing in case of a failure and prints an error |
174 message. For example if you invoke the parser |
177 message. For example if you invoke the parser |
175 |
178 |
176 |
179 |
177 @{ML [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\"))"} |
180 @{ML [display,gray] "!! (fn _ => \"foo\") ($$ \"h\")"} |
178 |
181 |
179 on @{text [quotes] "hello"}, the parsing succeeds |
182 on @{text [quotes] "hello"}, the parsing succeeds |
180 |
183 |
181 @{ML_response [display,gray] |
184 @{ML_response [display,gray] |
182 "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"hello\")" |
185 "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"hello\")" |
183 "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"} |
186 "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"} |
184 |
187 |
185 but if you invoke it on @{text [quotes] "world"} |
188 but if you invoke it on @{text [quotes] "world"} |
186 |
189 |
187 @{ML_response_fake [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"world\")" |
190 @{ML_response_fake [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"world\")" |
188 "Exception ABORT raised"} |
191 "Exception ABORT raised"} |
189 |
192 |
190 then the parsing aborts and the error message @{text "foo"} is printed. In order to |
193 then the parsing aborts and the error message @{text "foo"} is printed. In order to |
191 see the error message properly, you need to prefix the parser with the function |
194 see the error message properly, you need to prefix the parser with the function |
192 @{ML "Scan.error"}. For example: |
195 @{ML "Scan.error"}. For example: |
193 |
196 |
194 @{ML_response_fake [display,gray] "Scan.error (!! (fn _ => \"foo\") ($$ \"h\"))" |
197 @{ML_response_fake [display,gray] |
195 "Exception Error \"foo\" raised"} |
198 "Scan.error (!! (fn _ => \"foo\") ($$ \"h\"))" |
|
199 "Exception Error \"foo\" raised"} |
196 |
200 |
197 This ``prefixing'' is usually done by wrappers such as @{ML "OuterSyntax.local_theory"} |
201 This ``prefixing'' is usually done by wrappers such as @{ML "OuterSyntax.local_theory"} |
198 (see Section~\ref{sec:newcommand} which explains this function in more detail). |
202 (see Section~\ref{sec:newcommand} which explains this function in more detail). |
199 |
203 |
200 Let us now return to our example of parsing @{ML "(p -- q) || r" for p q |
204 Let us now return to our example of parsing @{ML "(p -- q) || r" for p q |
201 r}. If you want to generate the correct error message for p-followed-by-q, |
205 r}. If you want to generate the correct error message for |
202 then you have to write: |
206 @{text "p"}-followed-by-@{text "q"}, then you have to write: |
203 *} |
207 *} |
204 |
208 |
205 ML{*fun p_followed_by_q p q r = |
209 ML{*fun p_followed_by_q p q r = |
206 let |
210 let |
207 val err_msg = (fn _ => p ^ " is not followed by " ^ q) |
211 val err_msg = fn _ => p ^ " is not followed by " ^ q |
208 in |
212 in |
209 ($$ p -- (!! err_msg ($$ q))) || ($$ r -- $$ r) |
213 ($$ p -- (!! err_msg ($$ q))) || ($$ r -- $$ r) |
210 end *} |
214 end *} |
211 |
215 |
212 |
216 |
251 string as in |
255 string as in |
252 |
256 |
253 @{ML_response [display,gray] |
257 @{ML_response [display,gray] |
254 "let |
258 "let |
255 val p = Scan.repeat (Scan.one Symbol.not_eof) |
259 val p = Scan.repeat (Scan.one Symbol.not_eof) |
256 val input = (explode \"foo bar foo\") |
260 val input = explode \"foo bar foo\" |
257 in |
261 in |
258 Scan.finite Symbol.stopper p input |
262 Scan.finite Symbol.stopper p input |
259 end" |
263 end" |
260 "([\"f\", \"o\", \"o\", \" \", \"b\", \"a\", \"r\", \" \", \"f\", \"o\", \"o\"], [])"} |
264 "([\"f\", \"o\", \"o\", \" \", \"b\", \"a\", \"r\", \" \", \"f\", \"o\", \"o\"], [])"} |
261 |
265 |
280 symbol is a @{text [quotes] "*"}. |
284 symbol is a @{text [quotes] "*"}. |
281 |
285 |
282 @{ML_response [display,gray] |
286 @{ML_response [display,gray] |
283 "let |
287 "let |
284 val p = Scan.repeat (Scan.unless ($$ \"*\") (Scan.one Symbol.not_eof)) |
288 val p = Scan.repeat (Scan.unless ($$ \"*\") (Scan.one Symbol.not_eof)) |
285 val input1 = (explode \"fooooo\") |
289 val input1 = explode \"fooooo\" |
286 val input2 = (explode \"foo*ooo\") |
290 val input2 = explode \"foo*ooo\" |
287 in |
291 in |
288 (Scan.finite Symbol.stopper p input1, |
292 (Scan.finite Symbol.stopper p input1, |
289 Scan.finite Symbol.stopper p input2) |
293 Scan.finite Symbol.stopper p input2) |
290 end" |
294 end" |
291 "(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []), |
295 "(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []), |
330 The function @{ML Scan.lift} takes a parser and a pair as arguments. This function applies |
334 The function @{ML Scan.lift} takes a parser and a pair as arguments. This function applies |
331 the given parser to the second component of the pair and leaves the first component |
335 the given parser to the second component of the pair and leaves the first component |
332 untouched. For example |
336 untouched. For example |
333 |
337 |
334 @{ML_response [display,gray] |
338 @{ML_response [display,gray] |
335 "Scan.lift (($$ \"h\") -- ($$ \"e\")) (1,(explode \"hello\"))" |
339 "Scan.lift ($$ \"h\" -- $$ \"e\") (1, explode \"hello\")" |
336 "((\"h\", \"e\"), (1, [\"l\", \"l\", \"o\"]))"} |
340 "((\"h\", \"e\"), (1, [\"l\", \"l\", \"o\"]))"} |
337 |
341 |
338 (FIXME: In which situations is this useful? Give examples.) |
342 (FIXME: In which situations is this useful? Give examples.) |
339 |
343 |
340 \begin{exercise}\label{ex:scancmts} |
344 \begin{exercise}\label{ex:scancmts} |
341 Write a parser that parses an input string so that any comment enclosed |
345 Write a parser that parses an input string so that any comment enclosed |
342 within @{text "(*\<dots>*)"} is replaced by the same comment but enclosed within |
346 within @{text "(*\<dots>*)"} is replaced by the same comment but enclosed within |
343 @{text "(**\<dots>**)"} in the output string. To enclose a string, you can use the |
347 @{text "(**\<dots>**)"} in the output string. To enclose a string, you can use the |
344 function @{ML "enclose s1 s2 s" for s1 s2 s} which produces the string @{ML |
348 function @{ML "enclose s1 s2 s" for s1 s2 s} which produces the string @{ML |
345 "s1 ^ s ^ s2" for s1 s2 s}. |
349 "s1 ^ s ^ s2" for s1 s2 s}. Hint: To simplify the task ignore the proper |
|
350 nesting of comments. |
346 \end{exercise} |
351 \end{exercise} |
347 *} |
352 *} |
348 |
353 |
349 section {* Parsing Theory Syntax *} |
354 section {* Parsing Theory Syntax *} |
350 |
355 |