CookBook/Parsing.thy
changeset 56 126646f2aa88
parent 54 1783211b3494
child 58 f3794c231898
--- a/CookBook/Parsing.thy	Mon Dec 15 10:48:27 2008 +0100
+++ b/CookBook/Parsing.thy	Tue Dec 16 08:08:44 2008 +0000
@@ -92,10 +92,17 @@
 
   Note how the result of consumed strings builds up on the left as nested pairs.  
 
-  Parsers that explore 
-  alternatives can be constructed using the function @{ML "(op ||)"}. For example, the 
-  parser @{ML "(p || q)" for p q} returns the result of @{ML_text "p"}, in case it succeeds, 
-  otherwise it returns the result of @{ML_text "q"}. For example
+  If, as in the previous example, one wants to parse a particular string, 
+  then one should rather use the function @{ML Scan.this_string}:
+
+  @{ML_response [display] "Scan.this_string \"hel\" (explode \"hello\")"
+                          "(\"hel\", [\"l\", \"o\"])"}
+
+  Parsers that explore alternatives can be constructed using the function @{ML
+  "(op ||)"}. For example, the parser @{ML "(p || q)" for p q} returns the
+  result of @{ML_text "p"}, in case it succeeds, otherwise it returns the
+  result of @{ML_text "q"}. For example
+
 
 @{ML_response [display] 
 "let 
@@ -240,6 +247,50 @@
   However, this kind of manually wrapping needs to be done only very rarely
   in practise, because it is already done by the infrastructure for you. 
 
+  The function @{ML Scan.repeat} can be used with @{ML Scan.one} to read any 
+  string as in
+
+  @{ML_response [display] 
+"let 
+   val p = Scan.repeat (Scan.one Symbol.not_eof)
+   val input = (explode \"foo bar foo\") 
+in
+   Scan.finite Symbol.stopper p input
+end" 
+"([\"f\", \"o\", \"o\", \" \", \"b\", \"a\", \"r\", \" \", \"f\", \"o\", \"o\"], [])"}
+
+  where the function @{ML Symbol.not_eof} ensures that we do not read beyond the 
+  end of the input string.
+
+  The function @{ML "Scan.unless p q" for p q} takes two parsers: if the first one can 
+  parse the input then the whole parser fails; if not, then the second is tried. Therefore
+  
+  @{ML_response_fake_both [display] "Scan.unless ($$ \"h\") ($$ \"w\") (explode \"hello\")"
+                               "Exception FAIL raised"}
+
+  fails, while
+
+  @{ML_response [display] "Scan.unless ($$ \"h\") ($$ \"w\") (explode \"world\")"
+                          "(\"w\",[\"o\", \"r\", \"l\", \"d\"])"}
+
+  succeeds. 
+
+  The functions @{ML Scan.repeat} and @{ML Scan.unless} can be combined to read any
+  input until a certain marker symbol is reached. In the example below the marker
+  symbol is a @{text [quotes] "*"}:
+
+  @{ML_response [display]
+"let 
+  val p = Scan.repeat (Scan.unless ($$ \"*\") (Scan.one Symbol.not_eof))
+  val input1 = (explode \"fooooo\")
+  val input2 = (explode \"foo*ooo\")
+in
+  (Scan.finite Symbol.stopper p input1, 
+   Scan.finite Symbol.stopper p input2)
+end"
+"(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []),
+ ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"}
+
   After parsing succeeded, one nearly always wants to apply a function on the parsed 
   items. This is done using the function @{ML "(p >> f)" for p f} which runs 
   first the parser @{ML_text p} and upon successful completion applies the 
@@ -255,6 +306,14 @@
 
   doubles the two parsed input strings.
  
+  \begin{exercise}\label{ex:scancmts}
+  Write a parser parses an input string so that any comment enclosed inside
+  @{text "(*\<dots>*)"} is enclosed inside @{text "(**\<dots>**)"} in the output
+  string. To enclose a string, you can use the function @{ML "enclose s1 s2 s"
+  for s1 s2 s} which produces the string @{ML "s1^s^s2" for s1 s2 s}.
+  \end{exercise}
+
+
   The function @{ML Scan.lift} takes a parser and a pair as arguments. This function applies
   the given parser to the second component of the pair and leaves the  first component 
   untouched. For example