CookBook/Parsing.thy
changeset 40 35e1dff0d9bb
parent 39 631d12c25bde
child 41 b11653b11bd3
equal deleted inserted replaced
39:631d12c25bde 40:35e1dff0d9bb
    48 
    48 
    49   @{ML_text [display] "($$ \"x\") (explode \"world\")"}
    49   @{ML_text [display] "($$ \"x\") (explode \"world\")"}
    50 
    50 
    51   There are three exceptions used in the parsing combinators:
    51   There are three exceptions used in the parsing combinators:
    52 
    52 
    53   (FIXME: describe)
    53   (FIXME: describe exceptions)
    54 
    54 
    55   \begin{itemize}
    55   \begin{itemize}
    56   \item @{ML_text "FAIL"}
    56   \item @{ML_text "FAIL"}
    57   \item @{ML_text "MORE"}
    57   \item @{ML_text "MORE"} @{ML_text "($$ \"h\") []"}
    58   \item @{ML_text "ABORT"}
    58   \item @{ML_text "ABORT"} dead end
    59   \end{itemize}
    59   \end{itemize}
    60 
    60 
    61   Slightly more general than @{ML "(op $$)"} is the function @{ML Scan.one} in that it 
    61   Slightly more general than @{ML "(op $$)"} is the function @{ML Scan.one} in that it 
    62   takes a predicate as argument and then parses exactly one item from the input list 
    62   takes a predicate as argument and then parses exactly one item from the input list 
    63   satisfying this prediate. For example the following parser either consumes an @{ML_text "h"}
    63   satisfying this prediate. For example the following parser either consumes an @{ML_text "h"}
    64   or a @{ML_text "w"}:
    64   or a @{ML_text "w"}:
    65 
    65 
    66 @{ML_response [display] 
    66 @{ML_response [display] 
    67 "let val hw = Scan.one (fn x => x = \"h\" orelse x = \"w\")
    67 "let 
       
    68   val hw = Scan.one (fn x => x = \"h\" orelse x = \"w\")
    68   val input1 = (explode \"hello\")
    69   val input1 = (explode \"hello\")
    69   val input2 = (explode \"world\")
    70   val input2 = (explode \"world\")
    70 in
    71 in
    71     (hw input1, hw input2)
    72     (hw input1, hw input2)
    72 end"
    73 end"
    84   alternatives can be constructed using the function @{ML "(op ||)"}. For example, the 
    85   alternatives can be constructed using the function @{ML "(op ||)"}. For example, the 
    85   parser @{ML_open "p || q" for p q} returns the result of @{ML_text "p"}, if it succeeds, 
    86   parser @{ML_open "p || q" for p q} returns the result of @{ML_text "p"}, if it succeeds, 
    86   otherwise it returns the result of @{ML_text "q"}. For example
    87   otherwise it returns the result of @{ML_text "q"}. For example
    87 
    88 
    88 @{ML_response [display] 
    89 @{ML_response [display] 
    89 "let val hw = ($$ \"h\") || ($$ \"w\")
    90 "let 
       
    91   val hw = ($$ \"h\") || ($$ \"w\")
    90   val input1 = (explode \"hello\")
    92   val input1 = (explode \"hello\")
    91   val input2 = (explode \"world\")
    93   val input2 = (explode \"world\")
    92 in
    94 in
    93   (hw input1, hw input2)
    95   (hw input1, hw input2)
    94 end"
    96 end"
    95   "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
    97   "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
    96 
    98 
    97   will in the first case consume the @{ML_text "h"} and in the second the @{ML_text "w"}.
       
    98   The functions @{ML "(op |--)"} and @{ML "(op --|)"} work like the sequencing funtion 
    99   The functions @{ML "(op |--)"} and @{ML "(op --|)"} work like the sequencing funtion 
    99   for parsers, except that they discard the item parsed by the first (respectively second)
   100   for parsers, except that they discard the item parsed by the first (respectively second)
   100   parser. For example
   101   parser. For example
   101   
   102   
   102 @{ML_response [display]
   103 @{ML_response [display]
   103 "let val just_h = ($$ \"h\") |-- ($$ \"e\") 
   104 "let 
       
   105   val just_h = ($$ \"h\") |-- ($$ \"e\") 
   104   val just_e = ($$ \"h\") --| ($$ \"e\") 
   106   val just_e = ($$ \"h\") --| ($$ \"e\") 
   105   val input = (explode \"hello\")  
   107   val input = (explode \"hello\")  
   106 in 
   108 in 
   107   (just_h input, just_e input)
   109   (just_h input, just_e input)
   108 end"
   110 end"
   111   The parser @{ML_open "Scan.optional p x" for p x} returns the result of the parser 
   113   The parser @{ML_open "Scan.optional p x" for p x} returns the result of the parser 
   112   @{ML_text "p"}, if it succeeds; otherwise it returns 
   114   @{ML_text "p"}, if it succeeds; otherwise it returns 
   113   the default value @{ML_text "x"}. For example
   115   the default value @{ML_text "x"}. For example
   114 
   116 
   115 @{ML_response [display]
   117 @{ML_response [display]
   116 "let val p = Scan.optional ($$ \"h\") \"x\"
   118 "let 
       
   119   val p = Scan.optional ($$ \"h\") \"x\"
   117   val input1 = (explode \"hello\")
   120   val input1 = (explode \"hello\")
   118   val input2 = (explode \"world\")  
   121   val input2 = (explode \"world\")  
   119 in 
   122 in 
   120   (p input1, p input2)
   123   (p input1, p input2)
   121 end" 
   124 end" 
   122  "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}
   125  "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}
   123 
   126 
   124   The function @{ML "(op !!)"} helps to produce appropriate error messages
   127   The function @{ML "(op !!)"} helps to produce appropriate error messages
   125   during parsing. 
   128   during parsing. For example if one wants to parse @{ML_text p} immediately 
   126 
   129   followed by @{ML_text q}, or start a completely different parser @{ML_text r},
       
   130   one might write
       
   131 
       
   132   @{ML_open [display] "(p -- q) || r" for p q r}
       
   133 
       
   134   However, this way is problematic for producing an appropriate error message, in case
       
   135   the parsing of @{ML_open "(p -- q)" for p q} fails. Because one loses the information 
       
   136   that @{ML_text p} should be followed by @{ML_text q}. To see this consider the case that @{ML_text p} 
       
   137   is present in the input, but not @{ML_text q}. So @{ML_open "(p -- q)" for p q} will fail and the 
       
   138   alternative parser @{ML_text r} will be tried. In many circumstances this will be the wrong
       
   139   parser for the input and therefore probably fail. However, the error message is then caused by the
       
   140   failure of @{ML_text r}, not by the absense of @{ML_text p} in the input. These situations
       
   141   can be avoided using the funtion @{ML "(op !!)"}, which aborts the whole process of
       
   142   parsing and invokes an error message. For example if we invoke the parser
       
   143   
       
   144   @{ML [display] "(!! (fn _ => \"foo\") ($$ \"h\"))"}
       
   145 
       
   146   on @{ML_text "hello"}, the parsing succeeds
       
   147 
       
   148   @{ML_response [display] 
       
   149                 "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"hello\")" 
       
   150                 "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"}
       
   151 
       
   152   In contrast if we invoke it on @{ML_text "world"}
       
   153   
       
   154   @{ML [display] "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"world\")"}
       
   155 
       
   156   the parsing aborts and the error message @{ML_text "foo"} is printed out. In order to
       
   157   see the error message properly, we need to prefix the parser with the function 
       
   158   @{ML "Scan.error"}. For example
       
   159 
       
   160   @{ML [display] "Scan.error ((!! (fn _ => \"foo\") ($$ \"h\")))"}
       
   161 
       
   162   This ``prefixing'' is usually done by wrappers such as @{ML "OuterSyntax.command"} 
       
   163   (FIXME: see below). 
       
   164   
       
   165   Lets return to our example of parsing @{ML_open "(p -- q) || r" for p q r}. If we want
       
   166   to generate the correct error message for @{ML_text q} not following @{ML_text p}, then
       
   167   we have to write
   127 *}
   168 *}
   128 
   169 
   129 ML {*
   170 ML {*
   130   
   171   fun p_followed_by_q p q r =
   131   val err_fn = (fn _ => "foo");
   172   let 
   132   val p = (!! err_fn ($$ "h"))  || ($$ "w");
   173    val err = (fn _ => p ^ " is not followed by " ^ q)
   133   val input1 = (explode "hello");
   174   in
   134   val input2 = (explode "world");
   175   (($$ p) -- (!! err ($$ q))) || (($$ r) -- ($$ r))
   135 *}  
   176   end
   136 
   177 *}
   137 ML {*
   178 
   138 
   179 text {*
   139   (*Scan.error p input2;*)
   180   Running this parser with
   140 *}
   181 
   141 
   182   @{ML_text [display] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"holle\")"}
   142 text {* (FIXME: why does @{ML_text "p input2"} not do anything with foo?) *} 
   183 
   143 
   184   gives the correct error message and running it with
   144 text {* (FIXME: explain function application) *}
   185  
   145 
   186   @{ML_response [display] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"wworld\")"
   146 ML {* fun parse_fn (x,y) = (x,y^y) *}
   187                           "((\"w\", \"w\"), [\"o\", \"r\", \"l\", \"d\"])"}
   147 
   188   
   148 ML {* ((($$ "h") -- ($$ "e")) >> parse_fn) (explode "hello") *}
   189   yields the expected parsing. 
   149 
   190 
   150 text {* (FIXME: explain @{ML_text "lift"}) *}
   191   The function @{ML "Scan.repeat"} will apply a parser as often as it succeeds. For examle
       
   192   
       
   193   @{ML_response "Scan.repeat ($$ \"h\") (explode \"hhhhello\")" 
       
   194                 "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"}
       
   195   
       
   196   Note that @{ML "Scan.repeat"} stores the parsed items in a list. The function
       
   197   @{ML "Scan.repeat1"} is similar, but requires that in @{ML_open  "Scan.repeat1 p" for p}
       
   198   the parse @{ML_text "p"} succeeds at least once.
       
   199 *}
       
   200  
       
   201 text {* 
       
   202   After parsing succeeded, one wants to apply functions on the parsed items. This is
       
   203   done using the function @{ML_open "(p >> f)" for p f} which applies first the 
       
   204   parser @{ML_text p} upon successful completion applies the function @{ML_text f}.
       
   205   For example
       
   206 
       
   207 @{ML_response [display]
       
   208 "let 
       
   209   fun double (x,y) = (x^x,y^y) 
       
   210 in
       
   211   (($$ \"h\") -- ($$ \"e\") >> double) (explode \"hello\")
       
   212 end"
       
   213 "((\"hh\", \"ee\"), [\"l\", \"l\", \"o\"])"}
       
   214 
       
   215   The function @{ML Scan.lift} takes a parser and a pair as arguments. This function applies
       
   216   the given parser to the second component of the pair and leaves the  first component 
       
   217   untouched. For example
       
   218 
       
   219 @{ML_response [display]
       
   220 "Scan.lift (($$ \"h\") -- ($$ \"e\")) (1,(explode \"hello\"))"
       
   221 "((\"h\", \"e\"), (1, [\"l\", \"l\", \"o\"]))"}
       
   222 
       
   223   (FIXME: In which situations is this useful?) 
       
   224 *}
       
   225 
       
   226 section {* Parsing Tokens *}
       
   227 
       
   228 text {*
       
   229   Most of the time, however, we will have to deal with tokens that are not just strings.
       
   230   The parsers for the theory syntax, as well as the parsers for the argument syntax
       
   231   of proof methods and attributes use the token type @{ML_type OuterParse.token},
       
   232   which is identical to @{ML_type OuterLex.token}.
       
   233   The parser functions for the theory syntax are contained in the structure
       
   234   @{ML_struct OuterParse} defined in the file @{ML_file "Pure/Isar/outer_parse.ML"}.
       
   235 *}
       
   236 
   151 
   237 
   152 chapter {* Parsing *}
   238 chapter {* Parsing *}
   153 
   239 
   154 text {*
   240 text {*
   155 
   241