48 |
48 |
49 @{ML_text [display] "($$ \"x\") (explode \"world\")"} |
49 @{ML_text [display] "($$ \"x\") (explode \"world\")"} |
50 |
50 |
51 There are three exceptions used in the parsing combinators: |
51 There are three exceptions used in the parsing combinators: |
52 |
52 |
53 (FIXME: describe) |
53 (FIXME: describe exceptions) |
54 |
54 |
55 \begin{itemize} |
55 \begin{itemize} |
56 \item @{ML_text "FAIL"} |
56 \item @{ML_text "FAIL"} |
57 \item @{ML_text "MORE"} |
57 \item @{ML_text "MORE"} @{ML_text "($$ \"h\") []"} |
58 \item @{ML_text "ABORT"} |
58 \item @{ML_text "ABORT"} dead end |
59 \end{itemize} |
59 \end{itemize} |
60 |
60 |
61 Slightly more general than @{ML "(op $$)"} is the function @{ML Scan.one} in that it |
61 Slightly more general than @{ML "(op $$)"} is the function @{ML Scan.one} in that it |
62 takes a predicate as argument and then parses exactly one item from the input list |
62 takes a predicate as argument and then parses exactly one item from the input list |
63 satisfying this prediate. For example the following parser either consumes an @{ML_text "h"} |
63 satisfying this prediate. For example the following parser either consumes an @{ML_text "h"} |
64 or a @{ML_text "w"}: |
64 or a @{ML_text "w"}: |
65 |
65 |
66 @{ML_response [display] |
66 @{ML_response [display] |
67 "let val hw = Scan.one (fn x => x = \"h\" orelse x = \"w\") |
67 "let |
|
68 val hw = Scan.one (fn x => x = \"h\" orelse x = \"w\") |
68 val input1 = (explode \"hello\") |
69 val input1 = (explode \"hello\") |
69 val input2 = (explode \"world\") |
70 val input2 = (explode \"world\") |
70 in |
71 in |
71 (hw input1, hw input2) |
72 (hw input1, hw input2) |
72 end" |
73 end" |
84 alternatives can be constructed using the function @{ML "(op ||)"}. For example, the |
85 alternatives can be constructed using the function @{ML "(op ||)"}. For example, the |
85 parser @{ML_open "p || q" for p q} returns the result of @{ML_text "p"}, if it succeeds, |
86 parser @{ML_open "p || q" for p q} returns the result of @{ML_text "p"}, if it succeeds, |
86 otherwise it returns the result of @{ML_text "q"}. For example |
87 otherwise it returns the result of @{ML_text "q"}. For example |
87 |
88 |
88 @{ML_response [display] |
89 @{ML_response [display] |
89 "let val hw = ($$ \"h\") || ($$ \"w\") |
90 "let |
|
91 val hw = ($$ \"h\") || ($$ \"w\") |
90 val input1 = (explode \"hello\") |
92 val input1 = (explode \"hello\") |
91 val input2 = (explode \"world\") |
93 val input2 = (explode \"world\") |
92 in |
94 in |
93 (hw input1, hw input2) |
95 (hw input1, hw input2) |
94 end" |
96 end" |
95 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} |
97 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} |
96 |
98 |
97 will in the first case consume the @{ML_text "h"} and in the second the @{ML_text "w"}. |
|
98 The functions @{ML "(op |--)"} and @{ML "(op --|)"} work like the sequencing funtion |
99 The functions @{ML "(op |--)"} and @{ML "(op --|)"} work like the sequencing funtion |
99 for parsers, except that they discard the item parsed by the first (respectively second) |
100 for parsers, except that they discard the item parsed by the first (respectively second) |
100 parser. For example |
101 parser. For example |
101 |
102 |
102 @{ML_response [display] |
103 @{ML_response [display] |
103 "let val just_h = ($$ \"h\") |-- ($$ \"e\") |
104 "let |
|
105 val just_h = ($$ \"h\") |-- ($$ \"e\") |
104 val just_e = ($$ \"h\") --| ($$ \"e\") |
106 val just_e = ($$ \"h\") --| ($$ \"e\") |
105 val input = (explode \"hello\") |
107 val input = (explode \"hello\") |
106 in |
108 in |
107 (just_h input, just_e input) |
109 (just_h input, just_e input) |
108 end" |
110 end" |
111 The parser @{ML_open "Scan.optional p x" for p x} returns the result of the parser |
113 The parser @{ML_open "Scan.optional p x" for p x} returns the result of the parser |
112 @{ML_text "p"}, if it succeeds; otherwise it returns |
114 @{ML_text "p"}, if it succeeds; otherwise it returns |
113 the default value @{ML_text "x"}. For example |
115 the default value @{ML_text "x"}. For example |
114 |
116 |
115 @{ML_response [display] |
117 @{ML_response [display] |
116 "let val p = Scan.optional ($$ \"h\") \"x\" |
118 "let |
|
119 val p = Scan.optional ($$ \"h\") \"x\" |
117 val input1 = (explode \"hello\") |
120 val input1 = (explode \"hello\") |
118 val input2 = (explode \"world\") |
121 val input2 = (explode \"world\") |
119 in |
122 in |
120 (p input1, p input2) |
123 (p input1, p input2) |
121 end" |
124 end" |
122 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} |
125 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} |
123 |
126 |
124 The function @{ML "(op !!)"} helps to produce appropriate error messages |
127 The function @{ML "(op !!)"} helps to produce appropriate error messages |
125 during parsing. |
128 during parsing. For example if one wants to parse @{ML_text p} immediately |
126 |
129 followed by @{ML_text q}, or start a completely different parser @{ML_text r}, |
|
130 one might write |
|
131 |
|
132 @{ML_open [display] "(p -- q) || r" for p q r} |
|
133 |
|
134 However, this way is problematic for producing an appropriate error message, in case |
|
135 the parsing of @{ML_open "(p -- q)" for p q} fails. Because one loses the information |
|
136 that @{ML_text p} should be followed by @{ML_text q}. To see this consider the case that @{ML_text p} |
|
137 is present in the input, but not @{ML_text q}. So @{ML_open "(p -- q)" for p q} will fail and the |
|
138 alternative parser @{ML_text r} will be tried. In many circumstances this will be the wrong |
|
139 parser for the input and therefore probably fail. However, the error message is then caused by the |
|
140 failure of @{ML_text r}, not by the absense of @{ML_text p} in the input. These situations |
|
141 can be avoided using the funtion @{ML "(op !!)"}, which aborts the whole process of |
|
142 parsing and invokes an error message. For example if we invoke the parser |
|
143 |
|
144 @{ML [display] "(!! (fn _ => \"foo\") ($$ \"h\"))"} |
|
145 |
|
146 on @{ML_text "hello"}, the parsing succeeds |
|
147 |
|
148 @{ML_response [display] |
|
149 "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"hello\")" |
|
150 "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"} |
|
151 |
|
152 In contrast if we invoke it on @{ML_text "world"} |
|
153 |
|
154 @{ML [display] "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"world\")"} |
|
155 |
|
156 the parsing aborts and the error message @{ML_text "foo"} is printed out. In order to |
|
157 see the error message properly, we need to prefix the parser with the function |
|
158 @{ML "Scan.error"}. For example |
|
159 |
|
160 @{ML [display] "Scan.error ((!! (fn _ => \"foo\") ($$ \"h\")))"} |
|
161 |
|
162 This ``prefixing'' is usually done by wrappers such as @{ML "OuterSyntax.command"} |
|
163 (FIXME: see below). |
|
164 |
|
165 Lets return to our example of parsing @{ML_open "(p -- q) || r" for p q r}. If we want |
|
166 to generate the correct error message for @{ML_text q} not following @{ML_text p}, then |
|
167 we have to write |
127 *} |
168 *} |
128 |
169 |
129 ML {* |
170 ML {* |
130 |
171 fun p_followed_by_q p q r = |
131 val err_fn = (fn _ => "foo"); |
172 let |
132 val p = (!! err_fn ($$ "h")) || ($$ "w"); |
173 val err = (fn _ => p ^ " is not followed by " ^ q) |
133 val input1 = (explode "hello"); |
174 in |
134 val input2 = (explode "world"); |
175 (($$ p) -- (!! err ($$ q))) || (($$ r) -- ($$ r)) |
135 *} |
176 end |
136 |
177 *} |
137 ML {* |
178 |
138 |
179 text {* |
139 (*Scan.error p input2;*) |
180 Running this parser with |
140 *} |
181 |
141 |
182 @{ML_text [display] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"holle\")"} |
142 text {* (FIXME: why does @{ML_text "p input2"} not do anything with foo?) *} |
183 |
143 |
184 gives the correct error message and running it with |
144 text {* (FIXME: explain function application) *} |
185 |
145 |
186 @{ML_response [display] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"wworld\")" |
146 ML {* fun parse_fn (x,y) = (x,y^y) *} |
187 "((\"w\", \"w\"), [\"o\", \"r\", \"l\", \"d\"])"} |
147 |
188 |
148 ML {* ((($$ "h") -- ($$ "e")) >> parse_fn) (explode "hello") *} |
189 yields the expected parsing. |
149 |
190 |
150 text {* (FIXME: explain @{ML_text "lift"}) *} |
191 The function @{ML "Scan.repeat"} will apply a parser as often as it succeeds. For examle |
|
192 |
|
193 @{ML_response "Scan.repeat ($$ \"h\") (explode \"hhhhello\")" |
|
194 "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"} |
|
195 |
|
196 Note that @{ML "Scan.repeat"} stores the parsed items in a list. The function |
|
197 @{ML "Scan.repeat1"} is similar, but requires that in @{ML_open "Scan.repeat1 p" for p} |
|
198 the parse @{ML_text "p"} succeeds at least once. |
|
199 *} |
|
200 |
|
201 text {* |
|
202 After parsing succeeded, one wants to apply functions on the parsed items. This is |
|
203 done using the function @{ML_open "(p >> f)" for p f} which applies first the |
|
204 parser @{ML_text p} upon successful completion applies the function @{ML_text f}. |
|
205 For example |
|
206 |
|
207 @{ML_response [display] |
|
208 "let |
|
209 fun double (x,y) = (x^x,y^y) |
|
210 in |
|
211 (($$ \"h\") -- ($$ \"e\") >> double) (explode \"hello\") |
|
212 end" |
|
213 "((\"hh\", \"ee\"), [\"l\", \"l\", \"o\"])"} |
|
214 |
|
215 The function @{ML Scan.lift} takes a parser and a pair as arguments. This function applies |
|
216 the given parser to the second component of the pair and leaves the first component |
|
217 untouched. For example |
|
218 |
|
219 @{ML_response [display] |
|
220 "Scan.lift (($$ \"h\") -- ($$ \"e\")) (1,(explode \"hello\"))" |
|
221 "((\"h\", \"e\"), (1, [\"l\", \"l\", \"o\"]))"} |
|
222 |
|
223 (FIXME: In which situations is this useful?) |
|
224 *} |
|
225 |
|
226 section {* Parsing Tokens *} |
|
227 |
|
228 text {* |
|
229 Most of the time, however, we will have to deal with tokens that are not just strings. |
|
230 The parsers for the theory syntax, as well as the parsers for the argument syntax |
|
231 of proof methods and attributes use the token type @{ML_type OuterParse.token}, |
|
232 which is identical to @{ML_type OuterLex.token}. |
|
233 The parser functions for the theory syntax are contained in the structure |
|
234 @{ML_struct OuterParse} defined in the file @{ML_file "Pure/Isar/outer_parse.ML"}. |
|
235 *} |
|
236 |
151 |
237 |
152 chapter {* Parsing *} |
238 chapter {* Parsing *} |
153 |
239 |
154 text {* |
240 text {* |
155 |
241 |