CookBook/Parsing.thy
changeset 41 b11653b11bd3
parent 40 35e1dff0d9bb
child 42 cd612b489504
equal deleted inserted replaced
40:35e1dff0d9bb 41:b11653b11bd3
    28 
    28 
    29 *}
    29 *}
    30 
    30 
    31 section {* Building Up Generic Parsers *}
    31 section {* Building Up Generic Parsers *}
    32 
    32 
    33 
       
    34 text {*
    33 text {*
    35 
    34 
    36   Let us first have a look at parsing strings using generic parsing combinators. 
    35   Let us first have a look at parsing strings using generic parsing combinators. 
    37   The function @{ML "(op $$)"} takes a string and will ``consume'' this string from 
    36   The function @{ML "(op $$)"} takes a string and will ``consume'' this string from 
    38   a given input list of strings. ``Consume'' in this context means that it will 
    37   a given input list of strings. ``Consume'' in this context means that it will 
    41 
    40 
    42   @{ML_response [display] "($$ \"h\") (explode \"hello\")" "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"}
    41   @{ML_response [display] "($$ \"h\") (explode \"hello\")" "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"}
    43   @{ML_response [display] "($$ \"w\") (explode \"world\")" "(\"w\", [\"o\", \"r\", \"l\", \"d\"])"}
    42   @{ML_response [display] "($$ \"w\") (explode \"world\")" "(\"w\", [\"o\", \"r\", \"l\", \"d\"])"}
    44 
    43 
    45   This function will either succeed (as in the two examples above) or raise the exeption 
    44   This function will either succeed (as in the two examples above) or raise the exeption 
    46   @{ML_text "FAIL"} if no string can be consumed. For example in the next example
    45   @{ML_text "FAIL"} if no string can be consumed. For example trying to parse
    47   try to parse: 
    46 
    48 
    47   @{ML_response_fake [display] "($$ \"x\") (explode \"world\")" 
    49   @{ML_text [display] "($$ \"x\") (explode \"world\")"}
    48                                "Exception FAIL raised"}
    50 
    49   
       
    50   will raise the exception @{ML_text "FAIL"}.
    51   There are three exceptions used in the parsing combinators:
    51   There are three exceptions used in the parsing combinators:
    52 
    52 
    53   (FIXME: describe exceptions)
       
    54 
       
    55   \begin{itemize}
    53   \begin{itemize}
    56   \item @{ML_text "FAIL"}
    54   \item @{ML_text "FAIL"} is used to indicate that alternative routes of parsing 
    57   \item @{ML_text "MORE"} @{ML_text "($$ \"h\") []"}
    55   might be explored. 
    58   \item @{ML_text "ABORT"} dead end
    56   \item @{ML_text "MORE"} indicates that there is not enough input for the parser. For example 
       
    57   in @{ML_text "($$ \"h\") []"}.
       
    58   \item @{ML_text "ABORT"} is the exception which is raised when a dead end is reached. 
       
    59   It is used for example in the function @{ML "(op !!)"} (see below).
    59   \end{itemize}
    60   \end{itemize}
    60 
    61 
    61   Slightly more general than @{ML "(op $$)"} is the function @{ML Scan.one} in that it 
    62 
       
    63   Slightly more general than the parser @{ML "(op $$)"} is the function @{ML Scan.one}, in that it 
    62   takes a predicate as argument and then parses exactly one item from the input list 
    64   takes a predicate as argument and then parses exactly one item from the input list 
    63   satisfying this prediate. For example the following parser either consumes an @{ML_text "h"}
    65   satisfying this prediate. For example the following parser either consumes an @{ML_text "h"}
    64   or a @{ML_text "w"}:
    66   or a @{ML_text "w"}:
    65 
    67 
    66 @{ML_response [display] 
    68 @{ML_response [display] 
    72     (hw input1, hw input2)
    74     (hw input1, hw input2)
    73 end"
    75 end"
    74     "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
    76     "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
    75 
    77 
    76   Two parser can be connected in sequence by using the funtion @{ML "(op --)"}. 
    78   Two parser can be connected in sequence by using the funtion @{ML "(op --)"}. 
    77   For example   
    79   For example parsing @{ML_text "h"}, @{ML_text "e"} and @{ML_text "l"} in this 
       
    80   sequence can be achieved by
    78 
    81 
    79   @{ML_response [display] "(($$ \"h\") -- ($$ \"e\") -- ($$ \"l\")) (explode \"hello\")"
    82   @{ML_response [display] "(($$ \"h\") -- ($$ \"e\") -- ($$ \"l\")) (explode \"hello\")"
    80                           "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"}
    83                           "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"}
    81 
    84 
    82   Note how the result of consumed strings builds up on the left as nested pairs.  
    85   Note how the result of consumed strings builds up on the left as nested pairs.  
    83 
    86 
    84   Parsers that explore 
    87   Parsers that explore 
    85   alternatives can be constructed using the function @{ML "(op ||)"}. For example, the 
    88   alternatives can be constructed using the function @{ML "(op ||)"}. For example, the 
    86   parser @{ML_open "p || q" for p q} returns the result of @{ML_text "p"}, if it succeeds, 
    89   parser @{ML_open "(p || q)" for p q} returns the result of @{ML_text "p"}, if it succeeds, 
    87   otherwise it returns the result of @{ML_text "q"}. For example
    90   otherwise it returns the result of @{ML_text "q"}. For example
    88 
    91 
    89 @{ML_response [display] 
    92 @{ML_response [display] 
    90 "let 
    93 "let 
    91   val hw = ($$ \"h\") || ($$ \"w\")
    94   val hw = ($$ \"h\") || ($$ \"w\")
   129   followed by @{ML_text q}, or start a completely different parser @{ML_text r},
   132   followed by @{ML_text q}, or start a completely different parser @{ML_text r},
   130   one might write
   133   one might write
   131 
   134 
   132   @{ML_open [display] "(p -- q) || r" for p q r}
   135   @{ML_open [display] "(p -- q) || r" for p q r}
   133 
   136 
   134   However, this way is problematic for producing an appropriate error message, in case
   137   However, this parser is problematic for producing an appropriate error message, in case
   135   the parsing of @{ML_open "(p -- q)" for p q} fails. Because one loses the information 
   138   the parsing of @{ML_open "(p -- q)" for p q} fails. Because one loses the information 
   136   that @{ML_text p} should be followed by @{ML_text q}. To see this consider the case that @{ML_text p} 
   139   that @{ML_text p} should be followed by @{ML_text q}. To see this consider the case that @{ML_text p} 
   137   is present in the input, but not @{ML_text q}. So @{ML_open "(p -- q)" for p q} will fail and the 
   140   is present in the input, but not @{ML_text q}. So @{ML_open "(p -- q)" for p q} will fail and the 
   138   alternative parser @{ML_text r} will be tried. In many circumstances this will be the wrong
   141   alternative parser @{ML_text r} will be tried. However in many circumstanes this will be the wrong
   139   parser for the input and therefore probably fail. However, the error message is then caused by the
   142   parser for the input and therefore will fail. The error message is then caused by the
   140   failure of @{ML_text r}, not by the absense of @{ML_text p} in the input. These situations
   143   failure of @{ML_text r}, not by the absense of @{ML_text p} in the input. These situations
   141   can be avoided using the funtion @{ML "(op !!)"}, which aborts the whole process of
   144   can be avoided using the funtion @{ML "(op !!)"}. This function aborts the whole process of
   142   parsing and invokes an error message. For example if we invoke the parser
   145   parsing in case of failure and invokes an error message. For example if we invoke the parser
   143   
   146   
   144   @{ML [display] "(!! (fn _ => \"foo\") ($$ \"h\"))"}
   147   @{ML [display] "(!! (fn _ => \"foo\") ($$ \"h\"))"}
   145 
   148 
   146   on @{ML_text "hello"}, the parsing succeeds
   149   on @{ML_text "hello"}, the parsing succeeds
   147 
   150 
   148   @{ML_response [display] 
   151   @{ML_response [display] 
   149                 "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"hello\")" 
   152                 "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"hello\")" 
   150                 "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"}
   153                 "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"}
   151 
   154 
   152   In contrast if we invoke it on @{ML_text "world"}
   155   but if we invoke it on @{ML_text "world"}
   153   
   156   
   154   @{ML [display] "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"world\")"}
   157   @{ML_response_fake [display] "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"world\")"
       
   158                                "Exception ABORT raised"}
   155 
   159 
   156   the parsing aborts and the error message @{ML_text "foo"} is printed out. In order to
   160   the parsing aborts and the error message @{ML_text "foo"} is printed out. In order to
   157   see the error message properly, we need to prefix the parser with the function 
   161   see the error message properly, we need to prefix the parser with the function 
   158   @{ML "Scan.error"}. For example
   162   @{ML "Scan.error"}. For example
   159 
   163 
   160   @{ML [display] "Scan.error ((!! (fn _ => \"foo\") ($$ \"h\")))"}
   164   @{ML_response_fake [display] "Scan.error ((!! (fn _ => \"foo\") ($$ \"h\")))"
       
   165                                "Exception Error \"foo\" raised"}
   161 
   166 
   162   This ``prefixing'' is usually done by wrappers such as @{ML "OuterSyntax.command"} 
   167   This ``prefixing'' is usually done by wrappers such as @{ML "OuterSyntax.command"} 
   163   (FIXME: see below). 
   168   (FIXME: give reference to late). 
   164   
   169   
   165   Lets return to our example of parsing @{ML_open "(p -- q) || r" for p q r}. If we want
   170   Returning to our example of parsing @{ML_open "(p -- q) || r" for p q r}. If we want
   166   to generate the correct error message for @{ML_text q} not following @{ML_text p}, then
   171   to generate the correct error message for @{ML_text q} not following @{ML_text p}, then
   167   we have to write
   172   we have to write
   168 *}
   173 *}
   169 
   174 
   170 ML {*
   175 ML {*
   171   fun p_followed_by_q p q r =
   176   fun p_followed_by_q p q r =
   172   let 
   177   let 
   173    val err = (fn _ => p ^ " is not followed by " ^ q)
   178      val err = (fn _ => p ^ " is not followed by " ^ q)
   174   in
   179   in
   175   (($$ p) -- (!! err ($$ q))) || (($$ r) -- ($$ r))
   180     (($$ p) -- (!! err ($$ q))) || (($$ r) -- ($$ r))
   176   end
   181   end
   177 *}
   182 *}
   178 
   183 
       
   184 
   179 text {*
   185 text {*
   180   Running this parser with
   186   Running this parser with
   181 
   187 
   182   @{ML_text [display] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"holle\")"}
   188   @{ML_response_fake [display] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"holle\")"
   183 
   189                                "Exception ERROR \"h is not followed by e\" raised"} 
   184   gives the correct error message and running it with
   190 
       
   191   gives the correct error message. Running it with
   185  
   192  
   186   @{ML_response [display] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"wworld\")"
   193   @{ML_response [display] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"wworld\")"
   187                           "((\"w\", \"w\"), [\"o\", \"r\", \"l\", \"d\"])"}
   194                           "((\"w\", \"w\"), [\"o\", \"r\", \"l\", \"d\"])"}
   188   
   195   
   189   yields the expected parsing. 
   196   yields the expected parsing. 
   190 
   197 
   191   The function @{ML "Scan.repeat"} will apply a parser as often as it succeeds. For examle
   198   The function @{ML_open "Scan.repeat p" for p} will apply a parser @{ML_text p} as 
   192   
   199   often as it succeeds. For example
   193   @{ML_response "Scan.repeat ($$ \"h\") (explode \"hhhhello\")" 
   200   
       
   201   @{ML_response [display] "Scan.repeat ($$ \"h\") (explode \"hhhhello\")" 
   194                 "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"}
   202                 "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"}
   195   
   203   
   196   Note that @{ML "Scan.repeat"} stores the parsed items in a list. The function
   204   Note that @{ML "Scan.repeat"} stores the parsed items in a list. The function
   197   @{ML "Scan.repeat1"} is similar, but requires that in @{ML_open  "Scan.repeat1 p" for p}
   205   @{ML "Scan.repeat1"} is similar, but requires that the parser @{ML_text "p"} 
   198   the parse @{ML_text "p"} succeeds at least once.
   206   succeeds at least once.
   199 *}
   207 *}
   200  
   208  
   201 text {* 
   209 text {* 
   202   After parsing succeeded, one wants to apply functions on the parsed items. This is
   210   After parsing succeeded, one nearly always wants to apply a function on the parsed 
   203   done using the function @{ML_open "(p >> f)" for p f} which applies first the 
   211   items. This is done using the function @{ML_open "(p >> f)" for p f} which runs 
   204   parser @{ML_text p} upon successful completion applies the function @{ML_text f}.
   212   first the parser @{ML_text p} and upon successful completion applies the 
   205   For example
   213   function @{ML_text f} to the result. For example
   206 
   214 
   207 @{ML_response [display]
   215 @{ML_response [display]
   208 "let 
   216 "let 
   209   fun double (x,y) = (x^x,y^y) 
   217   fun double (x,y) = (x^x,y^y) 
   210 in
   218 in
   211   (($$ \"h\") -- ($$ \"e\") >> double) (explode \"hello\")
   219   (($$ \"h\") -- ($$ \"e\") >> double) (explode \"hello\")
   212 end"
   220 end"
   213 "((\"hh\", \"ee\"), [\"l\", \"l\", \"o\"])"}
   221 "((\"hh\", \"ee\"), [\"l\", \"l\", \"o\"])"}
   214 
   222 
       
   223   doubles the two parsed input strings.
       
   224  
   215   The function @{ML Scan.lift} takes a parser and a pair as arguments. This function applies
   225   The function @{ML Scan.lift} takes a parser and a pair as arguments. This function applies
   216   the given parser to the second component of the pair and leaves the  first component 
   226   the given parser to the second component of the pair and leaves the  first component 
   217   untouched. For example
   227   untouched. For example
   218 
   228 
   219 @{ML_response [display]
   229 @{ML_response [display]
   221 "((\"h\", \"e\"), (1, [\"l\", \"l\", \"o\"]))"}
   231 "((\"h\", \"e\"), (1, [\"l\", \"l\", \"o\"]))"}
   222 
   232 
   223   (FIXME: In which situations is this useful?) 
   233   (FIXME: In which situations is this useful?) 
   224 *}
   234 *}
   225 
   235 
   226 section {* Parsing Tokens *}
   236 section {* Parsing Theory Syntax *}
   227 
   237 
   228 text {*
   238 text {*
   229   Most of the time, however, we will have to deal with tokens that are not just strings.
   239   Most of the time, however, Isabelle developers have to deal with parsing tokens, not strings.
   230   The parsers for the theory syntax, as well as the parsers for the argument syntax
   240   This is because the parsers for the theory syntax, as well as the parsers for the 
   231   of proof methods and attributes use the token type @{ML_type OuterParse.token},
   241   argument syntax of proof methods and attributes, use the type 
   232   which is identical to @{ML_type OuterLex.token}.
   242   @{ML_type OuterParse.token} (which is identical to the type @{ML_type OuterLex.token}).
   233   The parser functions for the theory syntax are contained in the structure
   243   The parser functions for the theory syntax are contained in the structure
   234   @{ML_struct OuterParse} defined in the file @{ML_file "Pure/Isar/outer_parse.ML"}.
   244   @{ML_struct OuterParse} defined in the file\linebreak @{ML_file  "Pure/Isar/outer_parse.ML"}.
   235 *}
   245 *}
       
   246 
       
   247 ML {* map OuterLex.mk_text (explode "hello") *} 
       
   248 
       
   249 text {* 
       
   250 
       
   251   @{ML_response_fake [display] "map OuterLex.mk_text (explode \"hello\")"  
       
   252        "[Token (\<dots>), Token (\<dots>), Token (\<dots>), Token (\<dots>), Token (\<dots>)]"} 
       
   253 
       
   254 *}
       
   255 
       
   256 ML {*
       
   257   OuterLex.mk_text "hello"
       
   258 *}
       
   259 
       
   260 ML {*
       
   261 
       
   262   let val input = [OuterLex.mk_text "hello", OuterLex.mk_text "world"];
       
   263   in (Scan.one (fn _ => true)) input end
       
   264 
       
   265 *}
       
   266 
   236 
   267 
   237 
   268 
   238 chapter {* Parsing *}
   269 chapter {* Parsing *}
   239 
   270 
   240 text {*
   271 text {*