CookBook/Parsing.thy
changeset 39 631d12c25bde
parent 38 e21b2f888fa2
child 40 35e1dff0d9bb
equal deleted inserted replaced
38:e21b2f888fa2 39:631d12c25bde
     8 text {*
     8 text {*
     9 
     9 
    10   Isabelle distinguishes between \emph{outer} and \emph{inner} syntax. 
    10   Isabelle distinguishes between \emph{outer} and \emph{inner} syntax. 
    11   Theory commands, such as \isacommand{definition}, \isacommand{inductive} and so
    11   Theory commands, such as \isacommand{definition}, \isacommand{inductive} and so
    12   on, belong to the outer syntax, whereas items inside double quotation marks, such 
    12   on, belong to the outer syntax, whereas items inside double quotation marks, such 
    13   as terms and types, belong to the inner syntax. For parsing inner syntax, 
    13   as terms, types and so on, belong to the inner syntax. For parsing inner syntax, 
    14   Isabelle uses a rather general and sophisticated algorithm due to Earley, which 
    14   Isabelle uses a rather general and sophisticated algorithm due to Earley, which 
    15   is driven by priority grammars. Parsers for outer syntax are built up by functional
    15   is driven by priority grammars. Parsers for outer syntax are built up by functional
    16   parsing combinators. These combinators are a well-established technique for parsing, 
    16   parsing combinators. These combinators are a well-established technique for parsing, 
    17   which has, for example, been described in Paulson's classic book \cite{paulson-ml2}.
    17   which has, for example, been described in Paulson's classic book \cite{paulson-ml2}.
    18   
    18   
    28 
    28 
    29 *}
    29 *}
    30 
    30 
    31 section {* Building Up Generic Parsers *}
    31 section {* Building Up Generic Parsers *}
    32 
    32 
    33 text {*
    33 
       
    34 text {*
       
    35 
    34   Let us first have a look at parsing strings using generic parsing combinators. 
    36   Let us first have a look at parsing strings using generic parsing combinators. 
    35   The function @{ML "$$"} takes a string and will ``consume'' this string from 
    37   The function @{ML "(op $$)"} takes a string and will ``consume'' this string from 
    36   a given input list of strings. Consume in this context means that it will 
    38   a given input list of strings. ``Consume'' in this context means that it will 
    37   return a pair consisting of this string and the rest of the list. 
    39   return a pair consisting of this string and the rest of the input list. 
    38   For example:
    40   For example:
    39 *}
    41 
    40 
    42   @{ML_response [display] "($$ \"h\") (explode \"hello\")" "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"}
    41 ML {* ($$ "h") (explode "hello"); 
    43   @{ML_response [display] "($$ \"w\") (explode \"world\")" "(\"w\", [\"o\", \"r\", \"l\", \"d\"])"}
    42        ($$ "w") (explode "world") *}
    44 
    43 
    45   This function will either succeed (as in the two examples above) or raise the exeption 
    44 text {*
    46   @{ML_text "FAIL"} if no string can be consumed. For example in the next example
    45   returns @{ML "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"} and then 
    47   try to parse: 
    46   @{ML "(\"w\", [\"o\", \"r\", \"l\", \"d\"])"}. This function will either succeed (as in
    48 
    47   the two examples above) or raise the exeption @{ML_text "FAIL"} if no string
    49   @{ML_text [display] "($$ \"x\") (explode \"world\")"}
    48   can be consumed: for example @{ML_text "($$ \"x\") (explode \"world\")"}.
    50 
    49 
    51   There are three exceptions used in the parsing combinators:
    50   Two such parser can be connected in sequence using the funtion @{ML "(op --)"}. 
    52 
       
    53   (FIXME: describe)
       
    54 
       
    55   \begin{itemize}
       
    56   \item @{ML_text "FAIL"}
       
    57   \item @{ML_text "MORE"}
       
    58   \item @{ML_text "ABORT"}
       
    59   \end{itemize}
       
    60 
       
    61   Slightly more general than @{ML "(op $$)"} is the function @{ML Scan.one} in that it 
       
    62   takes a predicate as argument and then parses exactly one item from the input list 
       
    63   satisfying this prediate. For example the following parser either consumes an @{ML_text "h"}
       
    64   or a @{ML_text "w"}:
       
    65 
       
    66 @{ML_response [display] 
       
    67 "let val hw = Scan.one (fn x => x = \"h\" orelse x = \"w\")
       
    68   val input1 = (explode \"hello\")
       
    69   val input2 = (explode \"world\")
       
    70 in
       
    71     (hw input1, hw input2)
       
    72 end"
       
    73     "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
       
    74 
       
    75   Two parser can be connected in sequence by using the funtion @{ML "(op --)"}. 
    51   For example   
    76   For example   
    52 *}
    77 
    53 
    78   @{ML_response [display] "(($$ \"h\") -- ($$ \"e\") -- ($$ \"l\")) (explode \"hello\")"
    54 ML {* (($$ "h") -- ($$ "e") -- ($$ "l")) (explode "hello") *}
    79                           "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"}
    55 
    80 
    56 text {*
    81   Note how the result of consumed strings builds up on the left as nested pairs.  
    57   returns @{ML "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"}. Note how the result of
       
    58   consumed strings builds up on the left as nested pairs.  
       
    59 
    82 
    60   Parsers that explore 
    83   Parsers that explore 
    61   alternatives can be constructed using the function @{ML "(op ||)"}. For example, the 
    84   alternatives can be constructed using the function @{ML "(op ||)"}. For example, the 
    62   parser @{ML_open "p || q" for p q} returns the result of @{ML_text "p"}, if it succeeds, 
    85   parser @{ML_open "p || q" for p q} returns the result of @{ML_text "p"}, if it succeeds, 
    63   otherwise it returns the result of @{ML_text "q"}. For example
    86   otherwise it returns the result of @{ML_text "q"}. For example
    64 *}
    87 
    65 
    88 @{ML_response [display] 
    66 ML {*
    89 "let val hw = ($$ \"h\") || ($$ \"w\")
    67   let val hw = ($$ "h") || ($$ "w")
    90   val input1 = (explode \"hello\")
    68       val input1 = (explode "hello")
    91   val input2 = (explode \"world\")
    69       val input2 = (explode "world")
    92 in
    70   in
    93   (hw input1, hw input2)
    71     (hw input1, hw input2)
    94 end"
    72   end
    95   "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
    73 *}
    96 
    74 
       
    75 text {*
       
    76   will in the first case consume the @{ML_text "h"} and in the second the @{ML_text "w"}.
    97   will in the first case consume the @{ML_text "h"} and in the second the @{ML_text "w"}.
    77   The functions @{ML "(op |--)"} and @{ML "(op --|)"} work like the sequencing funtion 
    98   The functions @{ML "(op |--)"} and @{ML "(op --|)"} work like the sequencing funtion 
    78   for parsers, except that they discard the item parsed by the first (respectively second)
    99   for parsers, except that they discard the item parsed by the first (respectively second)
    79   parser. For example
   100   parser. For example
    80 *}
   101   
    81 
   102 @{ML_response [display]
    82 ML {* 
   103 "let val just_h = ($$ \"h\") |-- ($$ \"e\") 
    83   let val just_h = ($$ "h") --| ($$ "e") 
   104   val just_e = ($$ \"h\") --| ($$ \"e\") 
    84       val just_e = ($$ "h") |-- ($$ "e") 
   105   val input = (explode \"hello\")  
    85       val input = (explode "hello")  
   106 in 
    86   in 
   107   (just_h input, just_e input)
    87    (just_h input, just_e input)
   108 end"
    88   end
   109   "((\"e\", [\"l\", \"l\", \"o\"]),(\"h\", [\"l\", \"l\", \"o\"]))"}
    89 *}
       
    90 
       
    91 text {*
       
    92   produces @{ML "(\"h\", [\"l\", \"l\", \"o\"])"} and @{ML "(\"e\", [\"l\", \"l\", \"o\"])"}, 
       
    93   respectively. 
       
    94 
       
    95   (FIXME does it make sense to explain the other functions from @{ML_text BASIC_SCAN}?)
       
    96 
   110 
    97   The parser @{ML_open "Scan.optional p x" for p x} returns the result of the parser 
   111   The parser @{ML_open "Scan.optional p x" for p x} returns the result of the parser 
    98   @{ML_text "p"}, if it succeeds; otherwise it returns 
   112   @{ML_text "p"}, if it succeeds; otherwise it returns 
    99   the default value @{ML_text "x"}. For example
   113   the default value @{ML_text "x"}. For example
   100 *}
   114 
   101 
   115 @{ML_response [display]
   102 ML {* 
   116 "let val p = Scan.optional ($$ \"h\") \"x\"
   103   let val p = Scan.optional ($$ "h") "x"
   117   val input1 = (explode \"hello\")
   104       val input1 = (explode "hello")
   118   val input2 = (explode \"world\")  
   105       val input2 = (explode "world")  
   119 in 
   106   in 
   120   (p input1, p input2)
   107    (p input1, p input2)
   121 end" 
   108   end
   122  "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}
   109 *}
   123 
   110 
   124   The function @{ML "(op !!)"} helps to produce appropriate error messages
   111 text {*
   125   during parsing. 
   112   returns @{ML "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"} in the first case and 
   126 
   113   in the second case @{ML "(\"x\", [\"w\", \"o\", \"r\", \"l\", \"d\"])"}.
   127 *}
   114   
       
   115 *}
       
   116 
       
   117 text {* (FIXME: explain @{ML "op !!"}) *}
       
   118 
   128 
   119 ML {*
   129 ML {*
   120   
   130   
   121   val err_fn = (fn _ => "foo");
   131   val err_fn = (fn _ => "foo");
   122   val p = (!! err_fn ($$ "h"))  || ($$ "w");
   132   val p = (!! err_fn ($$ "h"))  || ($$ "w");
   123   val input1 = (explode "hello");
   133   val input1 = (explode "hello");
   124   val input2 = (explode "world");
   134   val input2 = (explode "world");
   125   
   135 *}  
   126   p input1;
   136 
       
   137 ML {*
       
   138 
       
   139   (*Scan.error p input2;*)
   127 *}
   140 *}
   128 
   141 
   129 text {* (FIXME: why does @{ML_text "p input2"} not do anything with foo?) *} 
   142 text {* (FIXME: why does @{ML_text "p input2"} not do anything with foo?) *} 
   130 
   143 
   131 text {* (FIXME: explain function application) *}
   144 text {* (FIXME: explain function application) *}