28 |
28 |
29 *} |
29 *} |
30 |
30 |
31 section {* Building Up Generic Parsers *} |
31 section {* Building Up Generic Parsers *} |
32 |
32 |
33 text {* |
33 |
|
34 text {* |
|
35 |
34 Let us first have a look at parsing strings using generic parsing combinators. |
36 Let us first have a look at parsing strings using generic parsing combinators. |
35 The function @{ML "$$"} takes a string and will ``consume'' this string from |
37 The function @{ML "(op $$)"} takes a string and will ``consume'' this string from |
36 a given input list of strings. Consume in this context means that it will |
38 a given input list of strings. ``Consume'' in this context means that it will |
37 return a pair consisting of this string and the rest of the list. |
39 return a pair consisting of this string and the rest of the input list. |
38 For example: |
40 For example: |
39 *} |
41 |
40 |
42 @{ML_response [display] "($$ \"h\") (explode \"hello\")" "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"} |
41 ML {* ($$ "h") (explode "hello"); |
43 @{ML_response [display] "($$ \"w\") (explode \"world\")" "(\"w\", [\"o\", \"r\", \"l\", \"d\"])"} |
42 ($$ "w") (explode "world") *} |
44 |
43 |
45 This function will either succeed (as in the two examples above) or raise the exeption |
44 text {* |
46 @{ML_text "FAIL"} if no string can be consumed. For example in the next example |
45 returns @{ML "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"} and then |
47 try to parse: |
46 @{ML "(\"w\", [\"o\", \"r\", \"l\", \"d\"])"}. This function will either succeed (as in |
48 |
47 the two examples above) or raise the exeption @{ML_text "FAIL"} if no string |
49 @{ML_text [display] "($$ \"x\") (explode \"world\")"} |
48 can be consumed: for example @{ML_text "($$ \"x\") (explode \"world\")"}. |
50 |
49 |
51 There are three exceptions used in the parsing combinators: |
50 Two such parser can be connected in sequence using the funtion @{ML "(op --)"}. |
52 |
|
53 (FIXME: describe) |
|
54 |
|
55 \begin{itemize} |
|
56 \item @{ML_text "FAIL"} |
|
57 \item @{ML_text "MORE"} |
|
58 \item @{ML_text "ABORT"} |
|
59 \end{itemize} |
|
60 |
|
61 Slightly more general than @{ML "(op $$)"} is the function @{ML Scan.one} in that it |
|
62 takes a predicate as argument and then parses exactly one item from the input list |
|
63 satisfying this prediate. For example the following parser either consumes an @{ML_text "h"} |
|
64 or a @{ML_text "w"}: |
|
65 |
|
66 @{ML_response [display] |
|
67 "let val hw = Scan.one (fn x => x = \"h\" orelse x = \"w\") |
|
68 val input1 = (explode \"hello\") |
|
69 val input2 = (explode \"world\") |
|
70 in |
|
71 (hw input1, hw input2) |
|
72 end" |
|
73 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} |
|
74 |
|
75 Two parser can be connected in sequence by using the funtion @{ML "(op --)"}. |
51 For example |
76 For example |
52 *} |
77 |
53 |
78 @{ML_response [display] "(($$ \"h\") -- ($$ \"e\") -- ($$ \"l\")) (explode \"hello\")" |
54 ML {* (($$ "h") -- ($$ "e") -- ($$ "l")) (explode "hello") *} |
79 "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"} |
55 |
80 |
56 text {* |
81 Note how the result of consumed strings builds up on the left as nested pairs. |
57 returns @{ML "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"}. Note how the result of |
|
58 consumed strings builds up on the left as nested pairs. |
|
59 |
82 |
60 Parsers that explore |
83 Parsers that explore |
61 alternatives can be constructed using the function @{ML "(op ||)"}. For example, the |
84 alternatives can be constructed using the function @{ML "(op ||)"}. For example, the |
62 parser @{ML_open "p || q" for p q} returns the result of @{ML_text "p"}, if it succeeds, |
85 parser @{ML_open "p || q" for p q} returns the result of @{ML_text "p"}, if it succeeds, |
63 otherwise it returns the result of @{ML_text "q"}. For example |
86 otherwise it returns the result of @{ML_text "q"}. For example |
64 *} |
87 |
65 |
88 @{ML_response [display] |
66 ML {* |
89 "let val hw = ($$ \"h\") || ($$ \"w\") |
67 let val hw = ($$ "h") || ($$ "w") |
90 val input1 = (explode \"hello\") |
68 val input1 = (explode "hello") |
91 val input2 = (explode \"world\") |
69 val input2 = (explode "world") |
92 in |
70 in |
93 (hw input1, hw input2) |
71 (hw input1, hw input2) |
94 end" |
72 end |
95 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} |
73 *} |
96 |
74 |
|
75 text {* |
|
76 will in the first case consume the @{ML_text "h"} and in the second the @{ML_text "w"}. |
97 will in the first case consume the @{ML_text "h"} and in the second the @{ML_text "w"}. |
77 The functions @{ML "(op |--)"} and @{ML "(op --|)"} work like the sequencing funtion |
98 The functions @{ML "(op |--)"} and @{ML "(op --|)"} work like the sequencing funtion |
78 for parsers, except that they discard the item parsed by the first (respectively second) |
99 for parsers, except that they discard the item parsed by the first (respectively second) |
79 parser. For example |
100 parser. For example |
80 *} |
101 |
81 |
102 @{ML_response [display] |
82 ML {* |
103 "let val just_h = ($$ \"h\") |-- ($$ \"e\") |
83 let val just_h = ($$ "h") --| ($$ "e") |
104 val just_e = ($$ \"h\") --| ($$ \"e\") |
84 val just_e = ($$ "h") |-- ($$ "e") |
105 val input = (explode \"hello\") |
85 val input = (explode "hello") |
106 in |
86 in |
107 (just_h input, just_e input) |
87 (just_h input, just_e input) |
108 end" |
88 end |
109 "((\"e\", [\"l\", \"l\", \"o\"]),(\"h\", [\"l\", \"l\", \"o\"]))"} |
89 *} |
|
90 |
|
91 text {* |
|
92 produces @{ML "(\"h\", [\"l\", \"l\", \"o\"])"} and @{ML "(\"e\", [\"l\", \"l\", \"o\"])"}, |
|
93 respectively. |
|
94 |
|
95 (FIXME does it make sense to explain the other functions from @{ML_text BASIC_SCAN}?) |
|
96 |
110 |
97 The parser @{ML_open "Scan.optional p x" for p x} returns the result of the parser |
111 The parser @{ML_open "Scan.optional p x" for p x} returns the result of the parser |
98 @{ML_text "p"}, if it succeeds; otherwise it returns |
112 @{ML_text "p"}, if it succeeds; otherwise it returns |
99 the default value @{ML_text "x"}. For example |
113 the default value @{ML_text "x"}. For example |
100 *} |
114 |
101 |
115 @{ML_response [display] |
102 ML {* |
116 "let val p = Scan.optional ($$ \"h\") \"x\" |
103 let val p = Scan.optional ($$ "h") "x" |
117 val input1 = (explode \"hello\") |
104 val input1 = (explode "hello") |
118 val input2 = (explode \"world\") |
105 val input2 = (explode "world") |
119 in |
106 in |
120 (p input1, p input2) |
107 (p input1, p input2) |
121 end" |
108 end |
122 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} |
109 *} |
123 |
110 |
124 The function @{ML "(op !!)"} helps to produce appropriate error messages |
111 text {* |
125 during parsing. |
112 returns @{ML "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"} in the first case and |
126 |
113 in the second case @{ML "(\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"])"}. |
127 *} |
114 |
|
115 *} |
|
116 |
|
117 text {* (FIXME: explain @{ML "op !!"}) *} |
|
118 |
128 |
119 ML {* |
129 ML {* |
120 |
130 |
121 val err_fn = (fn _ => "foo"); |
131 val err_fn = (fn _ => "foo"); |
122 val p = (!! err_fn ($$ "h")) || ($$ "w"); |
132 val p = (!! err_fn ($$ "h")) || ($$ "w"); |
123 val input1 = (explode "hello"); |
133 val input1 = (explode "hello"); |
124 val input2 = (explode "world"); |
134 val input2 = (explode "world"); |
125 |
135 *} |
126 p input1; |
136 |
|
137 ML {* |
|
138 |
|
139 (*Scan.error p input2;*) |
127 *} |
140 *} |
128 |
141 |
129 text {* (FIXME: why does @{ML_text "p input2"} not do anything with foo?) *} |
142 text {* (FIXME: why does @{ML_text "p input2"} not do anything with foo?) *} |
130 |
143 |
131 text {* (FIXME: explain function application) *} |
144 text {* (FIXME: explain function application) *} |