CookBook/Parsing.thy
changeset 60 5b9c6010897b
parent 59 b5914f3c643c
child 65 c8e9a4f97916
equal deleted inserted replaced
59:b5914f3c643c 60:5b9c6010897b
    57   \begin{itemize}
    57   \begin{itemize}
    58   \item @{text "FAIL"} is used to indicate that alternative routes of parsing 
    58   \item @{text "FAIL"} is used to indicate that alternative routes of parsing 
    59   might be explored. 
    59   might be explored. 
    60   \item @{text "MORE"} indicates that there is not enough input for the parser. For example 
    60   \item @{text "MORE"} indicates that there is not enough input for the parser. For example 
    61   in @{text "($$ \"h\") []"}.
    61   in @{text "($$ \"h\") []"}.
    62   \item @{text "ABORT"} is the exception which is raised when a dead end is reached. 
    62   \item @{text "ABORT"} is the exception that is raised when a dead end is reached. 
    63   It is used for example in the function @{ML "(op !!)"} (see below).
    63   It is used for example in the function @{ML "(op !!)"} (see below).
    64   \end{itemize}
    64   \end{itemize}
    65 
    65 
    66   However, note that these exceptions are private to the parser and cannot be accessed
    66   However, note that these exceptions are private to the parser and cannot be accessed
    67   by the programmer (for example to handle them).
    67   by the programmer (for example to handle them).
    91                           "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"}
    91                           "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"}
    92 
    92 
    93   Note how the result of consumed strings builds up on the left as nested pairs.  
    93   Note how the result of consumed strings builds up on the left as nested pairs.  
    94 
    94 
    95   If, as in the previous example, one wants to parse a particular string, 
    95   If, as in the previous example, one wants to parse a particular string, 
    96   then one should rather use the function @{ML Scan.this_string}:
    96   then one should use the function @{ML Scan.this_string}:
    97 
    97 
    98   @{ML_response [display] "Scan.this_string \"hel\" (explode \"hello\")"
    98   @{ML_response [display] "Scan.this_string \"hel\" (explode \"hello\")"
    99                           "(\"hel\", [\"l\", \"o\"])"}
    99                           "(\"hel\", [\"l\", \"o\"])"}
   100 
   100 
   101   Parsers that explore alternatives can be constructed using the function @{ML
   101   Parsers that explore alternatives can be constructed using the function @{ML
   261 
   261 
   262   where the function @{ML Symbol.not_eof} ensures that we do not read beyond the 
   262   where the function @{ML Symbol.not_eof} ensures that we do not read beyond the 
   263   end of the input string.
   263   end of the input string.
   264 
   264 
   265   The function @{ML "Scan.unless p q" for p q} takes two parsers: if the first one can 
   265   The function @{ML "Scan.unless p q" for p q} takes two parsers: if the first one can 
   266   parse the input then the whole parser fails; if not, then the second is tried. Therefore
   266   parse the input, then the whole parser fails; if not, then the second is tried. Therefore
   267   
   267   
   268   @{ML_response_fake_both [display] "Scan.unless ($$ \"h\") ($$ \"w\") (explode \"hello\")"
   268   @{ML_response_fake_both [display] "Scan.unless ($$ \"h\") ($$ \"w\") (explode \"hello\")"
   269                                "Exception FAIL raised"}
   269                                "Exception FAIL raised"}
   270 
   270 
   271   fails, while
   271   fails, while
   275 
   275 
   276   succeeds. 
   276   succeeds. 
   277 
   277 
   278   The functions @{ML Scan.repeat} and @{ML Scan.unless} can be combined to read any
   278   The functions @{ML Scan.repeat} and @{ML Scan.unless} can be combined to read any
   279   input until a certain marker symbol is reached. In the example below the marker
   279   input until a certain marker symbol is reached. In the example below the marker
   280   symbol is a @{text [quotes] "*"} which causes the parser to stop:
   280   symbol is a @{text [quotes] "*"}.
   281 
   281 
   282   @{ML_response [display]
   282   @{ML_response [display]
   283 "let 
   283 "let 
   284   val p = Scan.repeat (Scan.unless ($$ \"*\") (Scan.one Symbol.not_eof))
   284   val p = Scan.repeat (Scan.unless ($$ \"*\") (Scan.one Symbol.not_eof))
   285   val input1 = (explode \"fooooo\")
   285   val input1 = (explode \"fooooo\")
   313 in
   313 in
   314    Scan.finite Symbol.stopper p input
   314    Scan.finite Symbol.stopper p input
   315 end" 
   315 end" 
   316 "(\"foo bar foo\",[])"}
   316 "(\"foo bar foo\",[])"}
   317 
   317 
   318   where the single strings in the parsed output are transformed
   318   where the single-character strings in the parsed output are transformed
   319   back into one string.
   319   back into one string.
   320  
   320  
   321   \begin{exercise}\label{ex:scancmts}
   321   \begin{exercise}\label{ex:scancmts}
   322   Write a parser that parses an input string so that any comment enclosed
   322   Write a parser that parses an input string so that any comment enclosed
   323   inside @{text "(*\<dots>*)"} is replaced by a the same comment but enclosed inside
   323   inside @{text "(*\<dots>*)"} is replaced by a the same comment but enclosed inside