CookBook/Parsing.thy
changeset 38 e21b2f888fa2
parent 16 5045dec52d2b
child 39 631d12c25bde
equal deleted inserted replaced
37:403d4e0ad712 38:e21b2f888fa2
     1 theory Parsing
     1 theory Parsing
     2 imports Main
     2 imports Base
     3 
     3 
     4 begin
     4 begin
       
     5 
       
     6 chapter {* Parsing *}
       
     7 
       
     8 text {*
       
     9 
       
    10   Isabelle distinguishes between \emph{outer} and \emph{inner} syntax. 
       
    11   Theory commands, such as \isacommand{definition}, \isacommand{inductive} and so
       
    12   on, belong to the outer syntax, whereas items inside double quotation marks, such 
       
    13   as terms and types, belong to the inner syntax. For parsing inner syntax, 
       
    14   Isabelle uses a rather general and sophisticated algorithm due to Earley, which 
       
    15   is driven by priority grammars. Parsers for outer syntax are built up by functional
       
    16   parsing combinators. These combinators are a well-established technique for parsing, 
       
    17   which has, for example, been described in Paulson's classic book \cite{paulson-ml2}.
       
    18   
       
    19   Isabelle developers are usually concerned with writing parsers for outer
       
    20   syntax, either for new definitional packages or for calling tactics. The library 
       
    21   for writing such parsers in can roughly be split up into two parts. 
       
    22   The first part consists of a collection of generic parser combinators defined
       
    23   in the structure @{ML_struct Scan} in the file 
       
    24   @{ML_file "Pure/General/scan.ML"}. The second part of the library consists of 
       
    25   combinators for dealing with specific token types, which are defined in the 
       
    26   structure @{ML_struct OuterParse} in the file 
       
    27   @{ML_file "Pure/Isar/outer_parse.ML"}.
       
    28 
       
    29 *}
       
    30 
       
    31 section {* Building Up Generic Parsers *}
       
    32 
       
    33 text {*
       
    34   Let us first have a look at parsing strings using generic parsing combinators. 
       
    35   The function @{ML "$$"} takes a string and will ``consume'' this string from 
       
    36   a given input list of strings. Consume in this context means that it will 
       
    37   return a pair consisting of this string and the rest of the list. 
       
    38   For example:
       
    39 *}
       
    40 
       
    41 ML {* ($$ "h") (explode "hello"); 
       
    42        ($$ "w") (explode "world") *}
       
    43 
       
    44 text {*
       
    45   returns @{ML "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"} and then 
       
    46   @{ML "(\"w\", [\"o\", \"r\", \"l\", \"d\"])"}. This function will either succeed (as in
       
    47   the two examples above) or raise the exeption @{ML_text "FAIL"} if no string
       
    48   can be consumed: for example @{ML_text "($$ \"x\") (explode \"world\")"}.
       
    49 
       
    50   Two such parser can be connected in sequence using the funtion @{ML "(op --)"}. 
       
    51   For example   
       
    52 *}
       
    53 
       
    54 ML {* (($$ "h") -- ($$ "e") -- ($$ "l")) (explode "hello") *}
       
    55 
       
    56 text {*
       
    57   returns @{ML "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"}. Note how the result of
       
    58   consumed strings builds up on the left as nested pairs.  
       
    59 
       
    60   Parsers that explore 
       
    61   alternatives can be constructed using the function @{ML "(op ||)"}. For example, the 
       
    62   parser @{ML_open "p || q" for p q} returns the result of @{ML_text "p"}, if it succeeds, 
       
    63   otherwise it returns the result of @{ML_text "q"}. For example
       
    64 *}
       
    65 
       
    66 ML {*
       
    67   let val hw = ($$ "h") || ($$ "w")
       
    68       val input1 = (explode "hello")
       
    69       val input2 = (explode "world")
       
    70   in
       
    71     (hw input1, hw input2)
       
    72   end
       
    73 *}
       
    74 
       
    75 text {*
       
    76   will in the first case consume the @{ML_text "h"} and in the second the @{ML_text "w"}.
       
    77   The functions @{ML "(op |--)"} and @{ML "(op --|)"} work like the sequencing funtion 
       
    78   for parsers, except that they discard the item parsed by the first (respectively second)
       
    79   parser. For example
       
    80 *}
       
    81 
       
    82 ML {* 
       
    83   let val just_h = ($$ "h") --| ($$ "e") 
       
    84       val just_e = ($$ "h") |-- ($$ "e") 
       
    85       val input = (explode "hello")  
       
    86   in 
       
    87    (just_h input, just_e input)
       
    88   end
       
    89 *}
       
    90 
       
    91 text {*
       
    92   produces @{ML "(\"h\", [\"l\", \"l\", \"o\"])"} and @{ML "(\"e\", [\"l\", \"l\", \"o\"])"}, 
       
    93   respectively. 
       
    94 
       
    95   (FIXME does it make sense to explain the other functions from @{ML_text BASIC_SCAN}?)
       
    96 
       
    97   The parser @{ML_open "Scan.optional p x" for p x} returns the result of the parser 
       
    98   @{ML_text "p"}, if it succeeds; otherwise it returns 
       
    99   the default value @{ML_text "x"}. For example
       
   100 *}
       
   101 
       
   102 ML {* 
       
   103   let val p = Scan.optional ($$ "h") "x"
       
   104       val input1 = (explode "hello")
       
   105       val input2 = (explode "world")  
       
   106   in 
       
   107    (p input1, p input2)
       
   108   end
       
   109 *}
       
   110 
       
   111 text {*
       
   112   returns @{ML "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"} in the first case and 
       
   113   in the second case @{ML "(\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"])"}.
       
   114   
       
   115 *}
       
   116 
       
   117 text {* (FIXME: explain @{ML "op !!"}) *}
       
   118 
       
   119 ML {*
       
   120   
       
   121   val err_fn = (fn _ => "foo");
       
   122   val p = (!! err_fn ($$ "h"))  || ($$ "w");
       
   123   val input1 = (explode "hello");
       
   124   val input2 = (explode "world");
       
   125   
       
   126   p input1;
       
   127 *}
       
   128 
       
   129 text {* (FIXME: why does @{ML_text "p input2"} not do anything with foo?) *} 
       
   130 
       
   131 text {* (FIXME: explain function application) *}
       
   132 
       
   133 ML {* fun parse_fn (x,y) = (x,y^y) *}
       
   134 
       
   135 ML {* ((($$ "h") -- ($$ "e")) >> parse_fn) (explode "hello") *}
       
   136 
       
   137 text {* (FIXME: explain @{ML_text "lift"}) *}
     5 
   138 
     6 chapter {* Parsing *}
   139 chapter {* Parsing *}
     7 
   140 
     8 text {*
   141 text {*
     9 
   142