1 theory Parsing |
1 theory Parsing |
2 imports Main |
2 imports Base |
3 |
3 |
4 begin |
4 begin |
|
5 |
|
6 chapter {* Parsing *} |
|
7 |
|
8 text {* |
|
9 |
|
10 Isabelle distinguishes between \emph{outer} and \emph{inner} syntax. |
|
11 Theory commands, such as \isacommand{definition}, \isacommand{inductive} and so |
|
12 on, belong to the outer syntax, whereas items inside double quotation marks, such |
|
13 as terms and types, belong to the inner syntax. For parsing inner syntax, |
|
14 Isabelle uses a rather general and sophisticated algorithm due to Earley, which |
|
15 is driven by priority grammars. Parsers for outer syntax are built up by functional |
|
16 parsing combinators. These combinators are a well-established technique for parsing, |
|
17 which has, for example, been described in Paulson's classic book \cite{paulson-ml2}. |
|
18 |
|
19 Isabelle developers are usually concerned with writing parsers for outer |
|
20 syntax, either for new definitional packages or for calling tactics. The library |
|
21 for writing such parsers in can roughly be split up into two parts. |
|
22 The first part consists of a collection of generic parser combinators defined |
|
23 in the structure @{ML_struct Scan} in the file |
|
24 @{ML_file "Pure/General/scan.ML"}. The second part of the library consists of |
|
25 combinators for dealing with specific token types, which are defined in the |
|
26 structure @{ML_struct OuterParse} in the file |
|
27 @{ML_file "Pure/Isar/outer_parse.ML"}. |
|
28 |
|
29 *} |
|
30 |
|
31 section {* Building Up Generic Parsers *} |
|
32 |
|
33 text {* |
|
34 Let us first have a look at parsing strings using generic parsing combinators. |
|
35 The function @{ML "$$"} takes a string and will ``consume'' this string from |
|
36 a given input list of strings. Consume in this context means that it will |
|
37 return a pair consisting of this string and the rest of the list. |
|
38 For example: |
|
39 *} |
|
40 |
|
41 ML {* ($$ "h") (explode "hello"); |
|
42 ($$ "w") (explode "world") *} |
|
43 |
|
44 text {* |
|
45 returns @{ML "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"} and then |
|
46 @{ML "(\"w\", [\"o\", \"r\", \"l\", \"d\"])"}. This function will either succeed (as in |
|
47 the two examples above) or raise the exeption @{ML_text "FAIL"} if no string |
|
48 can be consumed: for example @{ML_text "($$ \"x\") (explode \"world\")"}. |
|
49 |
|
50 Two such parser can be connected in sequence using the funtion @{ML "(op --)"}. |
|
51 For example |
|
52 *} |
|
53 |
|
54 ML {* (($$ "h") -- ($$ "e") -- ($$ "l")) (explode "hello") *} |
|
55 |
|
56 text {* |
|
57 returns @{ML "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"}. Note how the result of |
|
58 consumed strings builds up on the left as nested pairs. |
|
59 |
|
60 Parsers that explore |
|
61 alternatives can be constructed using the function @{ML "(op ||)"}. For example, the |
|
62 parser @{ML_open "p || q" for p q} returns the result of @{ML_text "p"}, if it succeeds, |
|
63 otherwise it returns the result of @{ML_text "q"}. For example |
|
64 *} |
|
65 |
|
66 ML {* |
|
67 let val hw = ($$ "h") || ($$ "w") |
|
68 val input1 = (explode "hello") |
|
69 val input2 = (explode "world") |
|
70 in |
|
71 (hw input1, hw input2) |
|
72 end |
|
73 *} |
|
74 |
|
75 text {* |
|
76 will in the first case consume the @{ML_text "h"} and in the second the @{ML_text "w"}. |
|
77 The functions @{ML "(op |--)"} and @{ML "(op --|)"} work like the sequencing funtion |
|
78 for parsers, except that they discard the item parsed by the first (respectively second) |
|
79 parser. For example |
|
80 *} |
|
81 |
|
82 ML {* |
|
83 let val just_h = ($$ "h") --| ($$ "e") |
|
84 val just_e = ($$ "h") |-- ($$ "e") |
|
85 val input = (explode "hello") |
|
86 in |
|
87 (just_h input, just_e input) |
|
88 end |
|
89 *} |
|
90 |
|
91 text {* |
|
92 produces @{ML "(\"h\", [\"l\", \"l\", \"o\"])"} and @{ML "(\"e\", [\"l\", \"l\", \"o\"])"}, |
|
93 respectively. |
|
94 |
|
95 (FIXME does it make sense to explain the other functions from @{ML_text BASIC_SCAN}?) |
|
96 |
|
97 The parser @{ML_open "Scan.optional p x" for p x} returns the result of the parser |
|
98 @{ML_text "p"}, if it succeeds; otherwise it returns |
|
99 the default value @{ML_text "x"}. For example |
|
100 *} |
|
101 |
|
102 ML {* |
|
103 let val p = Scan.optional ($$ "h") "x" |
|
104 val input1 = (explode "hello") |
|
105 val input2 = (explode "world") |
|
106 in |
|
107 (p input1, p input2) |
|
108 end |
|
109 *} |
|
110 |
|
111 text {* |
|
112 returns @{ML "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"} in the first case and |
|
113 in the second case @{ML "(\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"])"}. |
|
114 |
|
115 *} |
|
116 |
|
117 text {* (FIXME: explain @{ML "op !!"}) *} |
|
118 |
|
119 ML {* |
|
120 |
|
121 val err_fn = (fn _ => "foo"); |
|
122 val p = (!! err_fn ($$ "h")) || ($$ "w"); |
|
123 val input1 = (explode "hello"); |
|
124 val input2 = (explode "world"); |
|
125 |
|
126 p input1; |
|
127 *} |
|
128 |
|
129 text {* (FIXME: why does @{ML_text "p input2"} not do anything with foo?) *} |
|
130 |
|
131 text {* (FIXME: explain function application) *} |
|
132 |
|
133 ML {* fun parse_fn (x,y) = (x,y^y) *} |
|
134 |
|
135 ML {* ((($$ "h") -- ($$ "e")) >> parse_fn) (explode "hello") *} |
|
136 |
|
137 text {* (FIXME: explain @{ML_text "lift"}) *} |
5 |
138 |
6 chapter {* Parsing *} |
139 chapter {* Parsing *} |
7 |
140 |
8 text {* |
141 text {* |
9 |
142 |