13 on, belong to the outer syntax, whereas items inside double quotation marks, such |
13 on, belong to the outer syntax, whereas items inside double quotation marks, such |
14 as terms, types and so on, belong to the inner syntax. For parsing inner syntax, |
14 as terms, types and so on, belong to the inner syntax. For parsing inner syntax, |
15 Isabelle uses a rather general and sophisticated algorithm due to Earley, which |
15 Isabelle uses a rather general and sophisticated algorithm due to Earley, which |
16 is driven by priority grammars. Parsers for outer syntax are built up by functional |
16 is driven by priority grammars. Parsers for outer syntax are built up by functional |
17 parsing combinators. These combinators are a well-established technique for parsing, |
17 parsing combinators. These combinators are a well-established technique for parsing, |
18 which has, for example, been described in Paulson's classic book \cite{paulson-ml2}. |
18 which has, for example, been described in Paulson's classic ML-book \cite{paulson-ml2}. |
19 Isabelle developers are usually concerned with writing these outer syntax parsers, |
19 Isabelle developers are usually concerned with writing these outer syntax parsers, |
20 either for new definitional packages or for calling tactics with specific arguments. |
20 either for new definitional packages or for calling tactics with specific arguments. |
21 |
21 |
22 \begin{readmore} |
22 \begin{readmore} |
23 The library |
23 The library |
108 for parsers, except that they discard the item parsed by the first (respectively second) |
108 for parsers, except that they discard the item parsed by the first (respectively second) |
109 parser. For example |
109 parser. For example |
110 |
110 |
111 @{ML_response [display] |
111 @{ML_response [display] |
112 "let |
112 "let |
113 val just_h = ($$ \"h\") |-- ($$ \"e\") |
113 val just_e = ($$ \"h\") |-- ($$ \"e\") |
114 val just_e = ($$ \"h\") --| ($$ \"e\") |
114 val just_h = ($$ \"h\") --| ($$ \"e\") |
115 val input = (explode \"hello\") |
115 val input = (explode \"hello\") |
116 in |
116 in |
117 (just_h input, just_e input) |
117 (just_e input, just_h input) |
118 end" |
118 end" |
119 "((\"e\", [\"l\", \"l\", \"o\"]),(\"h\", [\"l\", \"l\", \"o\"]))"} |
119 "((\"e\", [\"l\", \"l\", \"o\"]),(\"h\", [\"l\", \"l\", \"o\"]))"} |
120 |
120 |
121 The parser @{ML_open "Scan.optional p x" for p x} returns the result of the parser |
121 The parser @{ML_open "Scan.optional p x" for p x} returns the result of the parser |
122 @{ML_text "p"}, if it succeeds; otherwise it returns |
122 @{ML_text "p"}, if it succeeds; otherwise it returns |
256 The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}. |
256 The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}. |
257 \end{readmore} |
257 \end{readmore} |
258 |
258 |
259 The structure @{ML_struct OuterLex} defines several kinds of token (for example |
259 The structure @{ML_struct OuterLex} defines several kinds of token (for example |
260 @{ML "OuterLex.Ident"} for identifiers, @{ML "OuterLex.Keyword"} for keywords and |
260 @{ML "OuterLex.Ident"} for identifiers, @{ML "OuterLex.Keyword"} for keywords and |
261 @{ML "OuterLex.Command"} for commands). |
261 @{ML "OuterLex.Command"} for commands). Some parsers take into account the |
|
262 kind of token. |
|
263 |
262 We can generate a token list using the function @{ML "OuterSyntax.scan"}, which we give |
264 We can generate a token list using the function @{ML "OuterSyntax.scan"}, which we give |
263 below @{ML "Position.none"} as argument since, at the moment, we are not interested in |
265 below @{ML "Position.none"} as argument since, at the moment, we are not interested in |
264 generating precise error messages. The following\footnote{There is something funny |
266 generating precise error messages. The following\footnote{There is something funny |
265 going on with the pretty printing of the result token list.} |
267 going on with the pretty printing of the result token list.} |
266 |
268 |
267 @{ML_response [display] "OuterSyntax.scan Position.none \"hello world\"" |
269 @{ML_response [display] "OuterSyntax.scan Position.none \"hello world\"" |
268 "[OuterLex.Token (\<dots>,(OuterLex.Ident, \"hello\"),\<dots>), |
270 "[OuterLex.Token (\<dots>,(OuterLex.Ident, \"hello\"),\<dots>), |
269 OuterLex.Token (\<dots>,(OuterLex.Space, \" \"),\<dots>), |
271 OuterLex.Token (\<dots>,(OuterLex.Space, \" \"),\<dots>), |
270 OuterLex.Token (\<dots>,(OuterLex.Ident, \"world\"),\<dots>)]"} |
272 OuterLex.Token (\<dots>,(OuterLex.Ident, \"world\"),\<dots>)]"} |
271 |
273 |
272 produces three token where the first and the last are identifiers, since |
274 produces three tokens where the first and the last are identifiers, since |
273 @{ML_text [quotes] "hello"} and @{ML_text [quotes] "world"} do not match |
275 @{ML_text [quotes] "hello"} and @{ML_text [quotes] "world"} do not match |
274 any other category. The second indicates a space. If we parse |
276 any other category. The second indicates a space. If we parse |
275 |
277 |
276 @{ML_response [display] "OuterSyntax.scan Position.none \"inductive|for\"" |
278 @{ML_response [display] "OuterSyntax.scan Position.none \"inductive|for\"" |
277 "[OuterLex.Token (\<dots>,(OuterLex.Command, \"inductive\"),\<dots>), |
279 "[OuterLex.Token (\<dots>,(OuterLex.Command, \"inductive\"),\<dots>), |
278 OuterLex.Token (\<dots>,(OuterLex.Keyword, \"|\"),\<dots>), |
280 OuterLex.Token (\<dots>,(OuterLex.Keyword, \"|\"),\<dots>), |
279 OuterLex.Token (\<dots>,(OuterLex.Keyword, \"for\"),\<dots>)]"} |
281 OuterLex.Token (\<dots>,(OuterLex.Keyword, \"for\"),\<dots>)]"} |
280 |
282 |
281 we obtain a list of command/keyword token. |
283 we obtain a list of command and keyword tokens. |
|
284 If you want to see which keywords and commands are currently known, use |
|
285 the following (you might have to adjust the @{ML print_depth} in order to |
|
286 see the complete list): |
|
287 |
|
288 @{ML_response_fake [display] |
|
289 "let |
|
290 val (keywords, commands) = OuterKeyword.get_lexicons () |
|
291 in |
|
292 (Scan.dest_lexicon commands, Scan.dest_lexicon keywords) |
|
293 end" |
|
294 "([\"}\",\"{\",\<dots>],[\"\<rightleftharpoons>\",\"\<leftharpoondown>\",\<dots>])"} |
282 |
295 |
283 Now the parser @{ML "OuterParse.$$$"} parses a single keyword. For example |
296 Now the parser @{ML "OuterParse.$$$"} parses a single keyword. For example |
284 |
297 |
285 @{ML_response [display] |
298 @{ML_response [display] |
286 "let |
299 "let |
560 \end{verbatim} |
576 \end{verbatim} |
561 but here are some runnable examples for viewing tokens: |
577 but here are some runnable examples for viewing tokens: |
562 |
578 |
563 *} |
579 *} |
564 |
580 |
565 text {* |
581 |
566 FIXME |
582 |
567 |
583 |
568 @{text " |
584 ML {* |
569 begin{verbatim} |
585 val toks = OuterSyntax.scan Position.none |
570 type token = T.token ; |
586 "theory,imports;begin x.y.z apply ?v1 ?'a 'a -- || 44 simp (* xx *) { * fff * }" ; |
571 val toks : token list = OuterSyntax.scan ``theory,imports;begin x.y.z apply ?v1 ?'a 'a -- || 44 simp (* xx *) { * fff * }'' ; |
587 *} |
|
588 |
|
589 ML {* |
572 print_depth 20 ; |
590 print_depth 20 ; |
573 List.map T.text_of toks ; |
591 *} |
574 val proper_toks = List.filter T.is_proper toks ; |
592 |
575 List.map T.kind_of proper_toks ; |
593 ML {* |
576 List.map T.unparse proper_toks ; |
594 map OuterLex.text_of toks ; |
577 List.map T.val_of proper_toks ; |
595 *} |
578 end{verbatim}"} |
596 |
579 |
597 ML {* |
|
598 val proper_toks = filter OuterLex.is_proper toks ; |
|
599 *} |
|
600 |
|
601 ML {* |
|
602 map OuterLex.kind_of proper_toks |
|
603 *} |
|
604 |
|
605 ML {* |
|
606 map OuterLex.unparse proper_toks ; |
|
607 *} |
|
608 |
|
609 ML {* |
|
610 OuterLex.stopper |
580 *} |
611 *} |
581 |
612 |
582 text {* |
613 text {* |
583 |
614 |
584 The function \texttt{is\_proper : token -> bool} identifies tokens which are |
615 The function \texttt{is\_proper : token -> bool} identifies tokens which are |