81 in |
81 in |
82 (hw input1, hw input2) |
82 (hw input1, hw input2) |
83 end" |
83 end" |
84 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} |
84 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} |
85 |
85 |
86 Two parser can be connected in sequence by using the function @{ML "--"}. |
86 Two parsers can be connected in sequence by using the function @{ML "--"}. |
87 For example parsing @{text "h"}, @{text "e"} and @{text "l"} in this |
87 For example parsing @{text "h"}, @{text "e"} and @{text "l"} (in this |
88 sequence you can achieve by: |
88 order) you can achieve by: |
89 |
89 |
90 @{ML_response [display,gray] "(($$ \"h\") -- ($$ \"e\") -- ($$ \"l\")) (explode \"hello\")" |
90 @{ML_response [display,gray] "(($$ \"h\") -- ($$ \"e\") -- ($$ \"l\")) (explode \"hello\")" |
91 "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"} |
91 "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"} |
92 |
92 |
93 Note how the result of consumed strings builds up on the left as nested pairs. |
93 Note how the result of consumed strings builds up on the left as nested pairs. |
97 |
97 |
98 @{ML_response [display,gray] "Scan.this_string \"hell\" (explode \"hello\")" |
98 @{ML_response [display,gray] "Scan.this_string \"hell\" (explode \"hello\")" |
99 "(\"hell\", [\"o\"])"} |
99 "(\"hell\", [\"o\"])"} |
100 |
100 |
101 Parsers that explore alternatives can be constructed using the function @{ML |
101 Parsers that explore alternatives can be constructed using the function @{ML |
102 "||"}. For example, the parser @{ML "(p || q)" for p q} returns the |
102 "||"}. The parser @{ML "(p || q)" for p q} returns the |
103 result of @{text "p"}, in case it succeeds, otherwise it returns the |
103 result of @{text "p"}, in case it succeeds, otherwise it returns the |
104 result of @{text "q"}. For example: |
104 result of @{text "q"}. For example: |
105 |
105 |
106 |
106 |
107 @{ML_response [display,gray] |
107 @{ML_response [display,gray] |
153 in |
153 in |
154 (p input1, p input2) |
154 (p input1, p input2) |
155 end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} |
155 end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} |
156 |
156 |
157 The function @{ML "!!"} helps to produce appropriate error messages |
157 The function @{ML "!!"} helps to produce appropriate error messages |
158 during parsing. For example if you want to parse that @{text p} is immediately |
158 for parsing. For example if you want to parse @{text p} immediately |
159 followed by @{text q}, or start a completely different parser @{text r}, |
159 followed by @{text q}, or start a completely different parser @{text r}, |
160 you might write: |
160 you might write: |
161 |
161 |
162 @{ML [display,gray] "(p -- q) || r" for p q r} |
162 @{ML [display,gray] "(p -- q) || r" for p q r} |
163 |
163 |
164 However, this parser is problematic for producing an appropriate error |
164 However, this parser is problematic for producing an appropriate error |
165 message, in case the parsing of @{ML "(p -- q)" for p q} fails. Because in |
165 message, if the parsing of @{ML "(p -- q)" for p q} fails. Because in that |
166 that case you lose the information that @{text p} should be followed by |
166 case you lose the information that @{text p} should be followed by @{text q}. |
167 @{text q}. To see this consider the case in which @{text p} is present in |
167 To see this assume that @{text p} is present in the input, but it is not |
168 the input, but not @{text q}. That means @{ML "(p -- q)" for p q} will fail |
168 followed by @{text q}. That means @{ML "(p -- q)" for p q} will fail and |
169 and the alternative parser @{text r} will be tried. However in many |
169 hence the alternative parser @{text r} will be tried. However, in many |
170 circumstance this will be the wrong parser for the input ``p-followed-by-q'' |
170 circumstances this will be the wrong parser for the input ``p-followed-by-something'' |
171 and therefore will also fail. The error message is then caused by the |
171 and therefore will also fail. The error message is then caused by the failure |
172 failure of @{text r}, not by the absence of @{text q} in the input. This |
172 of @{text r}, not by the absence of @{text q} in the input. This kind of |
173 kind of situation can be avoided when using the function @{ML "!!"}. |
173 situation can be avoided when using the function @{ML "!!"}. This function |
174 This function aborts the whole process of parsing in case of a |
174 aborts the whole process of parsing in case of a failure and prints an error |
175 failure and prints an error message. For example if you invoke the parser |
175 message. For example if you invoke the parser |
176 |
176 |
177 |
177 |
178 @{ML [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\"))"} |
178 @{ML [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\"))"} |
179 |
179 |
180 on @{text [quotes] "hello"}, the parsing succeeds |
180 on @{text [quotes] "hello"}, the parsing succeeds |
210 ($$ p -- (!! err_msg ($$ q))) || ($$ r -- $$ r) |
210 ($$ p -- (!! err_msg ($$ q))) || ($$ r -- $$ r) |
211 end *} |
211 end *} |
212 |
212 |
213 |
213 |
214 text {* |
214 text {* |
215 Running this parser with the @{text [quotes] "h"} and @{text [quotes] "e"}, and |
215 Running this parser with the arguments |
|
216 @{text [quotes] "h"}, @{text [quotes] "e"} and @{text [quotes] "w"}, and |
216 the input @{text [quotes] "holle"} |
217 the input @{text [quotes] "holle"} |
217 |
218 |
218 @{ML_response_fake [display,gray] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"holle\")" |
219 @{ML_response_fake [display,gray] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"holle\")" |
219 "Exception ERROR \"h is not followed by e\" raised"} |
220 "Exception ERROR \"h is not followed by e\" raised"} |
220 |
221 |
289 Scan.finite Symbol.stopper p input2) |
290 Scan.finite Symbol.stopper p input2) |
290 end" |
291 end" |
291 "(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []), |
292 "(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []), |
292 ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"} |
293 ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"} |
293 |
294 |
294 After parsing is done, you nearly always want to apply a function on the parsed |
295 After parsing is done, you almost always want to apply a function to the parsed |
295 items. One way to do this is the function @{ML "(p >> f)" for p f}, which runs |
296 items. One way to do this is the function @{ML "(p >> f)" for p f}, which runs |
296 first the parser @{text p} and upon successful completion applies the |
297 first the parser @{text p} and upon successful completion applies the |
297 function @{text f} to the result. For example |
298 function @{text f} to the result. For example |
298 |
299 |
299 @{ML_response [display,gray] |
300 @{ML_response [display,gray] |
337 |
338 |
338 (FIXME: In which situations is this useful? Give examples.) |
339 (FIXME: In which situations is this useful? Give examples.) |
339 |
340 |
340 \begin{exercise}\label{ex:scancmts} |
341 \begin{exercise}\label{ex:scancmts} |
341 Write a parser that parses an input string so that any comment enclosed |
342 Write a parser that parses an input string so that any comment enclosed |
342 inside @{text "(*\<dots>*)"} is replaced by a the same comment but enclosed inside |
343 within @{text "(*\<dots>*)"} is replaced by the same comment but enclosed within |
343 @{text "(**\<dots>**)"} in the output string. To enclose a string, you can use the |
344 @{text "(**\<dots>**)"} in the output string. To enclose a string, you can use the |
344 function @{ML "enclose s1 s2 s" for s1 s2 s} which produces the string @{ML |
345 function @{ML "enclose s1 s2 s" for s1 s2 s} which produces the string @{ML |
345 "s1 ^ s ^ s2" for s1 s2 s}. |
346 "s1 ^ s ^ s2" for s1 s2 s}. |
346 \end{exercise} |
347 \end{exercise} |
347 *} |
348 *} |
518 |
519 |
519 (FIXME: or give parser for numbers) |
520 (FIXME: or give parser for numbers) |
520 |
521 |
521 Whenever there is a possibility that the processing of user input can fail, |
522 Whenever there is a possibility that the processing of user input can fail, |
522 it is a good idea to give as much information about where the error |
523 it is a good idea to give as much information about where the error |
523 occured. For this Isabelle can attach positional information to tokens |
524 occurred. For this Isabelle can attach positional information to tokens |
524 and then thread this information up the processing chain. To see this, |
525 and then thread this information up the processing chain. To see this, |
525 modify the function @{ML filtered_input} described earlier to |
526 modify the function @{ML filtered_input} described earlier to |
526 *} |
527 *} |
527 |
528 |
528 ML{*fun filtered_input' str = |
529 ML{*fun filtered_input' str = |
613 section {* Parsing Specifications\label{sec:parsingspecs} *} |
614 section {* Parsing Specifications\label{sec:parsingspecs} *} |
614 |
615 |
615 text {* |
616 text {* |
616 There are a number of special purpose parsers that help with parsing |
617 There are a number of special purpose parsers that help with parsing |
617 specifications of function definitions, inductive predicates and so on. In |
618 specifications of function definitions, inductive predicates and so on. In |
618 Capter~\ref{chp:package}, for example, we will need to parse specifications |
619 Chapter~\ref{chp:package}, for example, we will need to parse specifications |
619 for inductive predicates of the form: |
620 for inductive predicates of the form: |
620 *} |
621 *} |
621 |
622 |
622 simple_inductive |
623 simple_inductive |
623 even and odd |
624 even and odd |
689 text {* |
690 text {* |
690 Whenever types are given, they are stored in the @{ML SOME}s. The types are |
691 Whenever types are given, they are stored in the @{ML SOME}s. The types are |
691 not yet used to type the variables: this must be done by type-inference later |
692 not yet used to type the variables: this must be done by type-inference later |
692 on. Since types are part of the inner syntax they are strings with some |
693 on. Since types are part of the inner syntax they are strings with some |
693 encoded information (see previous section). If a syntax translation is |
694 encoded information (see previous section). If a syntax translation is |
694 present for a variable, then it is stored in the @{ML Mixfix} datastructure; |
695 present for a variable, then it is stored in the @{ML Mixfix} data structure; |
695 no syntax translation is indicated by @{ML NoSyn}. |
696 no syntax translation is indicated by @{ML NoSyn}. |
696 |
697 |
697 \begin{readmore} |
698 \begin{readmore} |
698 The datastructre for sytax annotations is defined in @{ML_file "Pure/Syntax/mixfix.ML"}. |
699 The data structure for syntax annotations is defined in @{ML_file "Pure/Syntax/mixfix.ML"}. |
699 \end{readmore} |
700 \end{readmore} |
700 |
701 |
701 Lines 3 to 7 in the function @{ML spec_parser} implement the parser for a |
702 Lines 3 to 7 in the function @{ML spec_parser} implement the parser for a |
702 list of introduction rules, that is propositions with theorem |
703 list of introduction rules, that is propositions with theorem |
703 annotations. The introduction rules are propositions parsed by @{ML |
704 annotations. The introduction rules are propositions parsed by @{ML |
1281 \texttt{SpecParse.token} are all the same). |
1282 \texttt{SpecParse.token} are all the same). |
1282 |
1283 |
1283 Input text is split up into tokens, and the input source type for many parsing |
1284 Input text is split up into tokens, and the input source type for many parsing |
1284 functions is \texttt{token list}. |
1285 functions is \texttt{token list}. |
1285 |
1286 |
1286 The datatype definition (which is not published in the signature) is |
1287 The data type definition (which is not published in the signature) is |
1287 \begin{verbatim} |
1288 \begin{verbatim} |
1288 datatype token = Token of Position.T * (token_kind * string); |
1289 datatype token = Token of Position.T * (token_kind * string); |
1289 \end{verbatim} |
1290 \end{verbatim} |
1290 but here are some runnable examples for viewing tokens: |
1291 but here are some runnable examples for viewing tokens: |
1291 |
1292 |
1466 |
1467 |
1467 section{* The \texttt{Args} structure *} |
1468 section{* The \texttt{Args} structure *} |
1468 |
1469 |
1469 text {* |
1470 text {* |
1470 The source file is \texttt{src/Pure/Isar/args.ML}. |
1471 The source file is \texttt{src/Pure/Isar/args.ML}. |
1471 The primary type of this structure is the \texttt{src} datatype; |
1472 The primary type of this structure is the \texttt{src} data type; |
1472 the single constructors not published in the signature, but |
1473 the single constructors not published in the signature, but |
1473 \texttt{Args.src} and \texttt{Args.dest\_src} |
1474 \texttt{Args.src} and \texttt{Args.dest\_src} |
1474 are in fact the constructor and destructor functions. |
1475 are in fact the constructor and destructor functions. |
1475 Note that the types \texttt{Attrib.src} and \texttt{Method.src} |
1476 Note that the types \texttt{Attrib.src} and \texttt{Method.src} |
1476 are in fact \texttt{Args.src}. |
1477 are in fact \texttt{Args.src}. |