CookBook/Parsing.thy
changeset 108 8bea3f74889d
parent 105 f49dc7e96235
child 114 13fd0a83d3c3
equal deleted inserted replaced
107:258ce361ba1b 108:8bea3f74889d
    35 section {* Building Generic Parsers *}
    35 section {* Building Generic Parsers *}
    36 
    36 
    37 text {*
    37 text {*
    38 
    38 
    39   Let us first have a look at parsing strings using generic parsing combinators. 
    39   Let us first have a look at parsing strings using generic parsing combinators. 
    40   The function @{ML "(op $$)"} takes a string as argument and will ``consume'' this string from 
    40   The function @{ML "$$"} takes a string as argument and will ``consume'' this string from 
    41   a given input list of strings. ``Consume'' in this context means that it will 
    41   a given input list of strings. ``Consume'' in this context means that it will 
    42   return a pair consisting of this string and the rest of the input list. 
    42   return a pair consisting of this string and the rest of the input list. 
    43   For example:
    43   For example:
    44 
    44 
    45   @{ML_response [display,gray] "($$ \"h\") (explode \"hello\")" "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"}
    45   @{ML_response [display,gray] "($$ \"h\") (explode \"hello\")" "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"}
    59   \item @{text "FAIL"} is used to indicate that alternative routes of parsing 
    59   \item @{text "FAIL"} is used to indicate that alternative routes of parsing 
    60   might be explored. 
    60   might be explored. 
    61   \item @{text "MORE"} indicates that there is not enough input for the parser. For example 
    61   \item @{text "MORE"} indicates that there is not enough input for the parser. For example 
    62   in @{text "($$ \"h\") []"}.
    62   in @{text "($$ \"h\") []"}.
    63   \item @{text "ABORT"} is the exception that is raised when a dead end is reached. 
    63   \item @{text "ABORT"} is the exception that is raised when a dead end is reached. 
    64   It is used for example in the function @{ML "(op !!)"} (see below).
    64   It is used for example in the function @{ML "!!"} (see below).
    65   \end{itemize}
    65   \end{itemize}
    66 
    66 
    67   However, note that these exceptions are private to the parser and cannot be accessed
    67   However, note that these exceptions are private to the parser and cannot be accessed
    68   by the programmer (for example to handle them).
    68   by the programmer (for example to handle them).
    69   
    69   
    70   Slightly more general than the parser @{ML "(op $$)"} is the function @{ML
    70   Slightly more general than the parser @{ML "$$"} is the function @{ML
    71   Scan.one}, in that it takes a predicate as argument and then parses exactly
    71   Scan.one}, in that it takes a predicate as argument and then parses exactly
    72   one item from the input list satisfying this predicate. For example the
    72   one item from the input list satisfying this predicate. For example the
    73   following parser either consumes an @{text [quotes] "h"} or a @{text
    73   following parser either consumes an @{text [quotes] "h"} or a @{text
    74   [quotes] "w"}:
    74   [quotes] "w"}:
    75 
    75 
    82 in
    82 in
    83     (hw input1, hw input2)
    83     (hw input1, hw input2)
    84 end"
    84 end"
    85     "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
    85     "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
    86 
    86 
    87   Two parser can be connected in sequence by using the function @{ML "(op --)"}. 
    87   Two parser can be connected in sequence by using the function @{ML "--"}. 
    88   For example parsing @{text "h"}, @{text "e"} and @{text "l"} in this 
    88   For example parsing @{text "h"}, @{text "e"} and @{text "l"} in this 
    89   sequence you can achieve by:
    89   sequence you can achieve by:
    90 
    90 
    91   @{ML_response [display,gray] "(($$ \"h\") -- ($$ \"e\") -- ($$ \"l\")) (explode \"hello\")"
    91   @{ML_response [display,gray] "(($$ \"h\") -- ($$ \"e\") -- ($$ \"l\")) (explode \"hello\")"
    92                           "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"}
    92                           "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"}
    98 
    98 
    99   @{ML_response [display,gray] "Scan.this_string \"hell\" (explode \"hello\")"
    99   @{ML_response [display,gray] "Scan.this_string \"hell\" (explode \"hello\")"
   100                           "(\"hell\", [\"o\"])"}
   100                           "(\"hell\", [\"o\"])"}
   101 
   101 
   102   Parsers that explore alternatives can be constructed using the function @{ML
   102   Parsers that explore alternatives can be constructed using the function @{ML
   103   "(op ||)"}. For example, the parser @{ML "(p || q)" for p q} returns the
   103   "||"}. For example, the parser @{ML "(p || q)" for p q} returns the
   104   result of @{text "p"}, in case it succeeds, otherwise it returns the
   104   result of @{text "p"}, in case it succeeds, otherwise it returns the
   105   result of @{text "q"}. For example:
   105   result of @{text "q"}. For example:
   106 
   106 
   107 
   107 
   108 @{ML_response [display,gray] 
   108 @{ML_response [display,gray] 
   113 in
   113 in
   114   (hw input1, hw input2)
   114   (hw input1, hw input2)
   115 end"
   115 end"
   116   "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
   116   "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
   117 
   117 
   118   The functions @{ML "(op |--)"} and @{ML "(op --|)"} work like the sequencing function 
   118   The functions @{ML "|--"} and @{ML "--|"} work like the sequencing function 
   119   for parsers, except that they discard the item being parsed by the first (respectively second)
   119   for parsers, except that they discard the item being parsed by the first (respectively second)
   120   parser. For example:
   120   parser. For example:
   121   
   121   
   122 @{ML_response [display,gray]
   122 @{ML_response [display,gray]
   123 "let 
   123 "let 
   153   val input2 = (explode \"world\")  
   153   val input2 = (explode \"world\")  
   154 in 
   154 in 
   155   (p input1, p input2)
   155   (p input1, p input2)
   156 end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}
   156 end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}
   157 
   157 
   158   The function @{ML "(op !!)"} helps to produce appropriate error messages
   158   The function @{ML "!!"} helps to produce appropriate error messages
   159   during parsing. For example if you want to parse that @{text p} is immediately 
   159   during parsing. For example if you want to parse that @{text p} is immediately 
   160   followed by @{text q}, or start a completely different parser @{text r},
   160   followed by @{text q}, or start a completely different parser @{text r},
   161   you might write:
   161   you might write:
   162 
   162 
   163   @{ML [display,gray] "(p -- q) || r" for p q r}
   163   @{ML [display,gray] "(p -- q) || r" for p q r}
   169   the input, but not @{text q}. That means @{ML "(p -- q)" for p q} will fail
   169   the input, but not @{text q}. That means @{ML "(p -- q)" for p q} will fail
   170   and the alternative parser @{text r} will be tried. However in many
   170   and the alternative parser @{text r} will be tried. However in many
   171   circumstance this will be the wrong parser for the input ``p-followed-by-q''
   171   circumstance this will be the wrong parser for the input ``p-followed-by-q''
   172   and therefore will also fail. The error message is then caused by the
   172   and therefore will also fail. The error message is then caused by the
   173   failure of @{text r}, not by the absence of @{text q} in the input. This
   173   failure of @{text r}, not by the absence of @{text q} in the input. This
   174   kind of situation can be avoided when using the function @{ML "(op !!)"}. 
   174   kind of situation can be avoided when using the function @{ML "!!"}. 
   175   This function aborts the whole process of parsing in case of a
   175   This function aborts the whole process of parsing in case of a
   176   failure and prints an error message. For example if you invoke the parser
   176   failure and prints an error message. For example if you invoke the parser
   177 
   177 
   178   
   178   
   179   @{ML [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\"))"}
   179   @{ML [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\"))"}
   187   but if you invoke it on @{text [quotes] "world"}
   187   but if you invoke it on @{text [quotes] "world"}
   188   
   188   
   189   @{ML_response_fake [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"world\")"
   189   @{ML_response_fake [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"world\")"
   190                                "Exception ABORT raised"}
   190                                "Exception ABORT raised"}
   191 
   191 
   192   then the parsing aborts and the error message @{text "foo"} is printed out. In order to
   192   then the parsing aborts and the error message @{text "foo"} is printed. In order to
   193   see the error message properly, we need to prefix the parser with the function 
   193   see the error message properly, we need to prefix the parser with the function 
   194   @{ML "Scan.error"}. For example:
   194   @{ML "Scan.error"}. For example:
   195 
   195 
   196   @{ML_response_fake [display,gray] "Scan.error (!! (fn _ => \"foo\") ($$ \"h\"))"
   196   @{ML_response_fake [display,gray] "Scan.error (!! (fn _ => \"foo\") ($$ \"h\"))"
   197                                "Exception Error \"foo\" raised"}
   197                                "Exception Error \"foo\" raised"}
   428 in 
   428 in 
   429   (OuterParse.$$$ \"where\" input1, OuterParse.$$$ \"|\" input2)
   429   (OuterParse.$$$ \"where\" input1, OuterParse.$$$ \"|\" input2)
   430 end"
   430 end"
   431 "((\"where\",\<dots>),(\"|\",\<dots>))"}
   431 "((\"where\",\<dots>),(\"|\",\<dots>))"}
   432 
   432 
   433   Like before, you can sequentially connect parsers with @{ML "(op --)"}. For example: 
   433   Like before, you can sequentially connect parsers with @{ML "--"}. For example: 
   434 
   434 
   435 @{ML_response [display,gray]
   435 @{ML_response [display,gray]
   436 "let 
   436 "let 
   437   val input = filtered_input \"| in\"
   437   val input = filtered_input \"| in\"
   438 in 
   438 in 
   569   indicated by @{ML NoSyn}.
   569   indicated by @{ML NoSyn}.
   570  
   570  
   571   (FIXME: should for-fixes take any syntax annotation?)
   571   (FIXME: should for-fixes take any syntax annotation?)
   572 
   572 
   573   @{ML OuterParse.for_fixes} is an ``optional'' that prefixes 
   573   @{ML OuterParse.for_fixes} is an ``optional'' that prefixes 
   574   @{ML OuterParse.fixes} with the comman \isacommand{for}.
   574   @{ML OuterParse.fixes} with the command \isacommand{for}.
   575   (FIXME give an example and explain more)
   575   (FIXME give an example and explain more)
   576 
   576 
   577 @{ML_response [display,gray]
   577 @{ML_response [display,gray]
   578 "let
   578 "let
   579   val input = filtered_input 
   579   val input = filtered_input