CookBook/Parsing.thy
changeset 56 126646f2aa88
parent 54 1783211b3494
child 58 f3794c231898
equal deleted inserted replaced
55:0b55402ae95e 56:126646f2aa88
    90   @{ML_response [display] "(($$ \"h\") -- ($$ \"e\") -- ($$ \"l\")) (explode \"hello\")"
    90   @{ML_response [display] "(($$ \"h\") -- ($$ \"e\") -- ($$ \"l\")) (explode \"hello\")"
    91                           "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"}
    91                           "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"}
    92 
    92 
    93   Note how the result of consumed strings builds up on the left as nested pairs.  
    93   Note how the result of consumed strings builds up on the left as nested pairs.  
    94 
    94 
    95   Parsers that explore 
    95   If, as in the previous example, one wants to parse a particular string, 
    96   alternatives can be constructed using the function @{ML "(op ||)"}. For example, the 
    96   then one should rather use the function @{ML Scan.this_string}:
    97   parser @{ML "(p || q)" for p q} returns the result of @{ML_text "p"}, in case it succeeds, 
    97 
    98   otherwise it returns the result of @{ML_text "q"}. For example
    98   @{ML_response [display] "Scan.this_string \"hel\" (explode \"hello\")"
       
    99                           "(\"hel\", [\"l\", \"o\"])"}
       
   100 
       
   101   Parsers that explore alternatives can be constructed using the function @{ML
       
   102   "(op ||)"}. For example, the parser @{ML "(p || q)" for p q} returns the
       
   103   result of @{ML_text "p"}, in case it succeeds, otherwise it returns the
       
   104   result of @{ML_text "q"}. For example
       
   105 
    99 
   106 
   100 @{ML_response [display] 
   107 @{ML_response [display] 
   101 "let 
   108 "let 
   102   val hw = ($$ \"h\") || ($$ \"w\")
   109   val hw = ($$ \"h\") || ($$ \"w\")
   103   val input1 = (explode \"hello\")
   110   val input1 = (explode \"hello\")
   238                 "([\"h\", \"h\", \"h\", \"h\"], [])"}
   245                 "([\"h\", \"h\", \"h\", \"h\"], [])"}
   239 
   246 
   240   However, this kind of manually wrapping needs to be done only very rarely
   247   However, this kind of manually wrapping needs to be done only very rarely
   241   in practise, because it is already done by the infrastructure for you. 
   248   in practise, because it is already done by the infrastructure for you. 
   242 
   249 
       
   250   The function @{ML Scan.repeat} can be used with @{ML Scan.one} to read any 
       
   251   string as in
       
   252 
       
   253   @{ML_response [display] 
       
   254 "let 
       
   255    val p = Scan.repeat (Scan.one Symbol.not_eof)
       
   256    val input = (explode \"foo bar foo\") 
       
   257 in
       
   258    Scan.finite Symbol.stopper p input
       
   259 end" 
       
   260 "([\"f\", \"o\", \"o\", \" \", \"b\", \"a\", \"r\", \" \", \"f\", \"o\", \"o\"], [])"}
       
   261 
       
   262   where the function @{ML Symbol.not_eof} ensures that we do not read beyond the 
       
   263   end of the input string.
       
   264 
       
   265   The function @{ML "Scan.unless p q" for p q} takes two parsers: if the first one can 
       
   266   parse the input then the whole parser fails; if not, then the second is tried. Therefore
       
   267   
       
   268   @{ML_response_fake_both [display] "Scan.unless ($$ \"h\") ($$ \"w\") (explode \"hello\")"
       
   269                                "Exception FAIL raised"}
       
   270 
       
   271   fails, while
       
   272 
       
   273   @{ML_response [display] "Scan.unless ($$ \"h\") ($$ \"w\") (explode \"world\")"
       
   274                           "(\"w\",[\"o\", \"r\", \"l\", \"d\"])"}
       
   275 
       
   276   succeeds. 
       
   277 
       
   278   The functions @{ML Scan.repeat} and @{ML Scan.unless} can be combined to read any
       
   279   input until a certain marker symbol is reached. In the example below the marker
       
   280   symbol is a @{text [quotes] "*"}:
       
   281 
       
   282   @{ML_response [display]
       
   283 "let 
       
   284   val p = Scan.repeat (Scan.unless ($$ \"*\") (Scan.one Symbol.not_eof))
       
   285   val input1 = (explode \"fooooo\")
       
   286   val input2 = (explode \"foo*ooo\")
       
   287 in
       
   288   (Scan.finite Symbol.stopper p input1, 
       
   289    Scan.finite Symbol.stopper p input2)
       
   290 end"
       
   291 "(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []),
       
   292  ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"}
       
   293 
   243   After parsing succeeded, one nearly always wants to apply a function on the parsed 
   294   After parsing succeeded, one nearly always wants to apply a function on the parsed 
   244   items. This is done using the function @{ML "(p >> f)" for p f} which runs 
   295   items. This is done using the function @{ML "(p >> f)" for p f} which runs 
   245   first the parser @{ML_text p} and upon successful completion applies the 
   296   first the parser @{ML_text p} and upon successful completion applies the 
   246   function @{ML_text f} to the result. For example
   297   function @{ML_text f} to the result. For example
   247 
   298 
   253 end"
   304 end"
   254 "((\"hh\", \"ee\"), [\"l\", \"l\", \"o\"])"}
   305 "((\"hh\", \"ee\"), [\"l\", \"l\", \"o\"])"}
   255 
   306 
   256   doubles the two parsed input strings.
   307   doubles the two parsed input strings.
   257  
   308  
       
   309   \begin{exercise}\label{ex:scancmts}
       
   310   Write a parser parses an input string so that any comment enclosed inside
       
   311   @{text "(*\<dots>*)"} is enclosed inside @{text "(**\<dots>**)"} in the output
       
   312   string. To enclose a string, you can use the function @{ML "enclose s1 s2 s"
       
   313   for s1 s2 s} which produces the string @{ML "s1^s^s2" for s1 s2 s}.
       
   314   \end{exercise}
       
   315 
       
   316 
   258   The function @{ML Scan.lift} takes a parser and a pair as arguments. This function applies
   317   The function @{ML Scan.lift} takes a parser and a pair as arguments. This function applies
   259   the given parser to the second component of the pair and leaves the  first component 
   318   the given parser to the second component of the pair and leaves the  first component 
   260   untouched. For example
   319   untouched. For example
   261 
   320 
   262 @{ML_response [display]
   321 @{ML_response [display]