CookBook/Parsing.thy
changeset 49 a0edabf14457
parent 48 609f9ef73494
child 50 3d4b49921cdb
equal deleted inserted replaced
48:609f9ef73494 49:a0edabf14457
    30   @{ML_file "Pure/Isar/outer_parse.ML"}.
    30   @{ML_file "Pure/Isar/outer_parse.ML"}.
    31   \end{readmore}
    31   \end{readmore}
    32 
    32 
    33 *}
    33 *}
    34 
    34 
    35 section {* Building Up Generic Parsers *}
    35 section {* Building Generic Parsers *}
    36 
    36 
    37 text {*
    37 text {*
    38 
    38 
    39   Let us first have a look at parsing strings using generic parsing combinators. 
    39   Let us first have a look at parsing strings using generic parsing combinators. 
    40   The function @{ML "(op $$)"} takes a string and will ``consume'' this string from 
    40   The function @{ML "(op $$)"} takes a string and will ``consume'' this string from 
    61   in @{ML_text "($$ \"h\") []"}.
    61   in @{ML_text "($$ \"h\") []"}.
    62   \item @{ML_text "ABORT"} is the exception which is raised when a dead end is reached. 
    62   \item @{ML_text "ABORT"} is the exception which is raised when a dead end is reached. 
    63   It is used for example in the function @{ML "(op !!)"} (see below).
    63   It is used for example in the function @{ML "(op !!)"} (see below).
    64   \end{itemize}
    64   \end{itemize}
    65 
    65 
    66   (FIXME: do the exception need to be explained, because the user cannot use them from ``outside''?)
    66   However, note that these exception private to the parser and cannot be accessed
    67 
    67   by the programmer (for example to handle them).
    68   Slightly more general than the parser @{ML "(op $$)"} is the function @{ML Scan.one}, in that it 
    68   
    69   takes a predicate as argument and then parses exactly one item from the input list 
    69   Slightly more general than the parser @{ML "(op $$)"} is the function @{ML
    70   satisfying this prediate. For example the following parser either consumes an 
    70   Scan.one}, in that it takes a predicate as argument and then parses exactly
    71   @{ML_text [quotes] "h"} or a @{ML_text [quotes] "w"}:
    71   one item from the input list satisfying this prediate. For example the
       
    72   following parser either consumes an @{ML_text [quotes] "h"} or a @{ML_text
       
    73   [quotes] "w"}:
       
    74 
    72 
    75 
    73 @{ML_response [display] 
    76 @{ML_response [display] 
    74 "let 
    77 "let 
    75   val hw = Scan.one (fn x => x = \"h\" orelse x = \"w\")
    78   val hw = Scan.one (fn x => x = \"h\" orelse x = \"w\")
    76   val input1 = (explode \"hello\")
    79   val input1 = (explode \"hello\")
   129   val input2 = (explode \"world\")  
   132   val input2 = (explode \"world\")  
   130 in 
   133 in 
   131   (p input1, p input2)
   134   (p input1, p input2)
   132 end" 
   135 end" 
   133  "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}
   136  "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}
       
   137 
       
   138   The function @{ML Scan.option} works similarly, except no default value can
       
   139   be given and in the failure case this parser does nothing.
   134 
   140 
   135   The function @{ML "(op !!)"} helps to produce appropriate error messages
   141   The function @{ML "(op !!)"} helps to produce appropriate error messages
   136   during parsing. For example if one wants to parse that @{ML_text p} is immediately 
   142   during parsing. For example if one wants to parse that @{ML_text p} is immediately 
   137   followed by @{ML_text q}, or start a completely different parser @{ML_text r},
   143   followed by @{ML_text q}, or start a completely different parser @{ML_text r},
   138   one might write
   144   one might write
   212   
   218   
   213   Note that @{ML "Scan.repeat"} stores the parsed items in a list. The function
   219   Note that @{ML "Scan.repeat"} stores the parsed items in a list. The function
   214   @{ML "Scan.repeat1"} is similar, but requires that the parser @{ML_text "p"} 
   220   @{ML "Scan.repeat1"} is similar, but requires that the parser @{ML_text "p"} 
   215   succeeds at least once.
   221   succeeds at least once.
   216 
   222 
       
   223   Also note that the parser would have aborted with the exception @{ML_text MORE}, if
       
   224   we had run it only on just @{ML_text [quotes] "hhhh"}. This can be awoided using
       
   225   the wrapper @{ML Scan.finite} and the ``stopper-token'' @{ML Symbol.stopper}. With
       
   226   them we can write
       
   227   
       
   228   @{ML_response [display] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (explode \"hhhh\")" 
       
   229                 "([\"h\", \"h\", \"h\", \"h\"], [])"}
       
   230 
       
   231   However, the Isabelle-develloper rarely needs to do this kind of wrapping manually. 
       
   232 
   217   After parsing succeeded, one nearly always wants to apply a function on the parsed 
   233   After parsing succeeded, one nearly always wants to apply a function on the parsed 
   218   items. This is done using the function @{ML_open "(p >> f)" for p f} which runs 
   234   items. This is done using the function @{ML_open "(p >> f)" for p f} which runs 
   219   first the parser @{ML_text p} and upon successful completion applies the 
   235   first the parser @{ML_text p} and upon successful completion applies the 
   220   function @{ML_text f} to the result. For example
   236   function @{ML_text f} to the result. For example
   221 
   237 
   262   We can generate a token list using the function @{ML "OuterSyntax.scan"}, which we give
   278   We can generate a token list using the function @{ML "OuterSyntax.scan"}, which we give
   263   below @{ML "Position.none"} as argument since, at the moment, we are not interested in
   279   below @{ML "Position.none"} as argument since, at the moment, we are not interested in
   264   generating precise error messages. The following\footnote{There is something funny 
   280   generating precise error messages. The following\footnote{There is something funny 
   265   going on with the pretty printing of the result token list.}
   281   going on with the pretty printing of the result token list.}
   266 
   282 
   267 @{ML_response [display] "OuterSyntax.scan Position.none \"hello world\"" 
   283 @{ML_response_fake [display] "OuterSyntax.scan Position.none \"hello world\"" 
   268 "[OuterLex.Token (\<dots>,(OuterLex.Ident, \"hello\"),\<dots>), 
   284 "[Token (\<dots>,(OuterLex.Ident, \"hello\"),\<dots>), 
   269  OuterLex.Token (\<dots>,(OuterLex.Space, \" \"),\<dots>), 
   285  Token (\<dots>,(OuterLex.Space, \" \"),\<dots>), 
   270  OuterLex.Token (\<dots>,(OuterLex.Ident, \"world\"),\<dots>)]"}
   286  Token (\<dots>,(OuterLex.Ident, \"world\"),\<dots>)]"}
   271 
   287 
   272   produces three tokens where the first and the last are identifiers, since 
   288   produces three tokens where the first and the last are identifiers, since 
   273   @{ML_text [quotes] "hello"} and @{ML_text [quotes] "world"} do not match 
   289   @{ML_text [quotes] "hello"} and @{ML_text [quotes] "world"} do not match 
   274   any other category. The second indicates a space. Many parsing functions 
   290   any other category. The second indicates a space. Many parsing functions 
   275   later on will require spaces, comments and the like to have been filtered out.
   291   later on will require spaces, comments and the like to have been filtered out.
   276   If we parse
   292   If we parse
   277 
   293 
   278 @{ML_response [display] "OuterSyntax.scan Position.none \"inductive|for\"" 
   294 @{ML_response_fake [display] "OuterSyntax.scan Position.none \"inductive|for\"" 
   279 "[OuterLex.Token (\<dots>,(OuterLex.Command, \"inductive\"),\<dots>), 
   295 "[Token (\<dots>,(OuterLex.Command, \"inductive\"),\<dots>), 
   280  OuterLex.Token (\<dots>,(OuterLex.Keyword, \"|\"),\<dots>), 
   296  Token (\<dots>,(OuterLex.Keyword, \"|\"),\<dots>), 
   281  OuterLex.Token (\<dots>,(OuterLex.Keyword, \"for\"),\<dots>)]"}
   297  Token (\<dots>,(OuterLex.Keyword, \"for\"),\<dots>)]"}
   282 
   298 
   283   we obtain a list consiting of only command and keyword tokens.
   299   we obtain a list consiting of only command and keyword tokens.
   284   If you want to see which keywords and commands are currently known, use
   300   If you want to see which keywords and commands are currently known, use
   285   the following (you might have to adjust the @{ML print_depth} in order to
   301   the following (you might have to adjust the @{ML print_depth} in order to
   286   see the complete list):
   302   see the complete list):
   314 end"
   330 end"
   315 "((\"|\",\"in\"),[])"}
   331 "((\"|\",\"in\"),[])"}
   316 
   332 
   317   The parser @{ML_open "OuterParse.enum s p" for s p} parses a possibly empty 
   333   The parser @{ML_open "OuterParse.enum s p" for s p} parses a possibly empty 
   318   list of items recognised by the parser @{ML_text p}, where the items are 
   334   list of items recognised by the parser @{ML_text p}, where the items are 
   319   separated by @{ML_text s}. For example
   335   separated by the string @{ML_text s}. For example
   320 
   336 
   321 @{ML_response [display]
   337 @{ML_response [display]
   322 "let 
   338 "let 
   323   val input = OuterSyntax.scan Position.none \"in|in|in\\n\"
   339   val input = OuterSyntax.scan Position.none \"in|in|in\\n\"
   324 in 
   340 in 
   325   (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input
   341   (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input
   326 end" 
   342 end" 
   327 "([\"in\",\"in\",\"in\"],[\<dots>])"}
   343 "([\"in\",\"in\",\"in\"],[\<dots>])"}
   328 
   344 
       
   345   @{ML "OuterParse.enum1"} works similarly,  except that the parsed list must 
       
   346   be non-empty.
       
   347 
   329   Note that we had to add a @{ML_text [quotes] "\\n"} at the end of the parsed
   348   Note that we had to add a @{ML_text [quotes] "\\n"} at the end of the parsed
   330   string, otherwise the parser would have consumed all tokens and then failed with
   349   string, otherwise the parser would have consumed all tokens and then failed with
   331   the exception @{ML_text "MORE"}. @{ML "OuterParse.enum1"} works similarly, 
   350   the exception @{ML_text "MORE"}. Like in the previous section, we can avoid
   332   except that the parsed list must be non-empty.
   351   this exception using the wrapper @{ML Scan.finite}. This time, however, we
   333 
   352   have to use the ``stopper-token'' @{ML OuterLex.stopper}. We can write
   334 *}
   353 
   335 
   354 @{ML_response [display]
   336 text {* FIXME explain @{ML "OuterParse.!!!"} *}
   355 "let 
       
   356   val input = OuterSyntax.scan Position.none \"in|in|in\"
       
   357 in 
       
   358   Scan.finite OuterLex.stopper (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input
       
   359 end" 
       
   360 "([\"in\",\"in\",\"in\"],[])"}
       
   361 
       
   362   The function @{ML "OuterParse.!!!"} can be used to force termination of the
       
   363   parser in case of a dead end, just like @{ML "Scan.!!"} (see previous section), 
       
   364   except that the error message is fixed to be @{text [quotes] "Outer syntax error"}
       
   365   with a relatively precise description of the failure. For example:
       
   366 
       
   367 @{ML_response_fake [display]
       
   368 "let 
       
   369   val input = OuterSyntax.scan Position.none \"in|\"
       
   370   val parse_bar_then_in = OuterParse.$$$ \"|\" -- OuterParse.$$$ \"in\"
       
   371 in 
       
   372   Scan.error (OuterParse.!!! parse_bar_then_in) input 
       
   373 end"
       
   374 "Exception ERROR \"Outer syntax error: keyword \"|\" expected, 
       
   375 but keyword in was found\" raised"
       
   376 }
       
   377 
       
   378 *}
       
   379 
       
   380 
       
   381 ML {*
       
   382 let 
       
   383   val input = filter OuterLex.is_proper (OuterSyntax.scan Position.none "(in foo)")
       
   384 in 
       
   385   OuterParse.target input
       
   386 end
       
   387 *}
       
   388 
       
   389 section {* Positional Information *}
       
   390 
       
   391 text {*
       
   392 
       
   393   @{ML OuterParse.position}
       
   394 
       
   395 *}
       
   396 
       
   397 ML {*
       
   398   OuterParse.position
       
   399 *}
       
   400 
   337 
   401 
   338 section {* Parsing Inner Syntax *}
   402 section {* Parsing Inner Syntax *}
   339 
   403 
   340 ML {*
   404 ML {*
   341 let 
   405 let