CookBook/Parsing.thy
changeset 149 253ea99c1441
parent 133 3e94ccc0f31e
child 156 e8f11280c762
equal deleted inserted replaced
148:84d1392186d3 149:253ea99c1441
    43 
    43 
    44   @{ML_response [display,gray] "($$ \"h\") (explode \"hello\")" "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"}
    44   @{ML_response [display,gray] "($$ \"h\") (explode \"hello\")" "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"}
    45 
    45 
    46   @{ML_response [display,gray] "($$ \"w\") (explode \"world\")" "(\"w\", [\"o\", \"r\", \"l\", \"d\"])"}
    46   @{ML_response [display,gray] "($$ \"w\") (explode \"world\")" "(\"w\", [\"o\", \"r\", \"l\", \"d\"])"}
    47 
    47 
    48   This function will either succeed (as in the two examples above) or raise the exception 
    48   The function @{ML "$$"} will either succeed (as in the two examples above) or raise the exception 
    49   @{text "FAIL"} if no string can be consumed. For example trying to parse
    49   @{text "FAIL"} if no string can be consumed. For example trying to parse
    50 
    50 
    51   @{ML_response_fake [display,gray] "($$ \"x\") (explode \"world\")" 
    51   @{ML_response_fake [display,gray] "($$ \"x\") (explode \"world\")" 
    52                                "Exception FAIL raised"}
    52                                "Exception FAIL raised"}
    53   
    53   
   316 "(\"foo bar foo\",[])"}
   316 "(\"foo bar foo\",[])"}
   317 
   317 
   318   where the single-character strings in the parsed output are transformed
   318   where the single-character strings in the parsed output are transformed
   319   back into one string.
   319   back into one string.
   320  
   320  
       
   321   The function @{ML Scan.ahead} parses some input, but leaves the original
       
   322   input unchanged. For example:
       
   323 
       
   324   @{ML_response [display,gray]
       
   325   "Scan.ahead (Scan.this_string \"foo\") (explode \"foo\")" 
       
   326   "(\"foo\", [\"f\", \"o\", \"o\"])"} 
       
   327 
       
   328   The function @{ML Scan.lift} takes a parser and a pair as arguments. This function applies
       
   329   the given parser to the second component of the pair and leaves the  first component 
       
   330   untouched. For example
       
   331 
       
   332 @{ML_response [display,gray]
       
   333 "Scan.lift (($$ \"h\") -- ($$ \"e\")) (1,(explode \"hello\"))"
       
   334 "((\"h\", \"e\"), (1, [\"l\", \"l\", \"o\"]))"}
       
   335 
       
   336   (FIXME: In which situations is this useful? Give examples.) 
       
   337 
   321   \begin{exercise}\label{ex:scancmts}
   338   \begin{exercise}\label{ex:scancmts}
   322   Write a parser that parses an input string so that any comment enclosed
   339   Write a parser that parses an input string so that any comment enclosed
   323   inside @{text "(*\<dots>*)"} is replaced by a the same comment but enclosed inside
   340   inside @{text "(*\<dots>*)"} is replaced by a the same comment but enclosed inside
   324   @{text "(**\<dots>**)"} in the output string. To enclose a string, you can use the
   341   @{text "(**\<dots>**)"} in the output string. To enclose a string, you can use the
   325   function @{ML "enclose s1 s2 s" for s1 s2 s} which produces the string @{ML
   342   function @{ML "enclose s1 s2 s" for s1 s2 s} which produces the string @{ML
   326   "s1 ^ s ^ s2" for s1 s2 s}.
   343   "s1 ^ s ^ s2" for s1 s2 s}.
   327   \end{exercise}
   344   \end{exercise}
   328 
       
   329   The function @{ML Scan.ahead} parses some input, but leaves the original
       
   330   input unchanged. For example:
       
   331 
       
   332   @{ML_response [display,gray]
       
   333   "Scan.ahead (Scan.this_string \"foo\") (explode \"foo\")" 
       
   334   "(\"foo\", [\"f\", \"o\", \"o\"])"} 
       
   335 
       
   336   The function @{ML Scan.lift} takes a parser and a pair as arguments. This function applies
       
   337   the given parser to the second component of the pair and leaves the  first component 
       
   338   untouched. For example
       
   339 
       
   340 @{ML_response [display,gray]
       
   341 "Scan.lift (($$ \"h\") -- ($$ \"e\")) (1,(explode \"hello\"))"
       
   342 "((\"h\", \"e\"), (1, [\"l\", \"l\", \"o\"]))"}
       
   343 
       
   344   (FIXME: In which situations is this useful? Give examples.) 
       
   345 *}
   345 *}
   346 
   346 
   347 section {* Parsing Theory Syntax *}
   347 section {* Parsing Theory Syntax *}
   348 
   348 
   349 text {*
   349 text {*
   352 *}
   352 *}
   353   
   353   
   354 ML{*type 'a parser = OuterLex.token list -> 'a * OuterLex.token list*}
   354 ML{*type 'a parser = OuterLex.token list -> 'a * OuterLex.token list*}
   355 
   355 
   356 text {*
   356 text {*
   357   This reason for using token parsers is that theory syntax, as well as the
   357   The reason for using token parsers is that theory syntax, as well as the
   358   parsers for the arguments of proof methods, use the type @{ML_type
   358   parsers for the arguments of proof methods, use the type @{ML_type
   359   OuterLex.token} (which is identical to the type @{ML_type
   359   OuterLex.token} (which is identical to the type @{ML_type
   360   OuterParse.token}).  However, there are also handy parsers for
   360   OuterParse.token}).  However, there are also handy parsers for
   361   ML-expressions and ML-files.
   361   ML-expressions and ML-files.
   362 
   362 
   574 
   574 
   575   @{ML_response [display,gray]
   575   @{ML_response [display,gray]
   576   "YXML.parse \"\\^E\\^Ftoken\\^Efoo\\^E\\^F\\^E\""
   576   "YXML.parse \"\\^E\\^Ftoken\\^Efoo\\^E\\^F\\^E\""
   577   "XML.Elem (\"token\", [], [XML.Text \"foo\"])"}
   577   "XML.Elem (\"token\", [], [XML.Text \"foo\"])"}
   578 
   578 
   579   This function returns an XML-tree. You can see better what is going on if
   579   The result of the decoding is an XML-tree. You can see better what is going on if
   580   you replace @{ML Position.none} by @{ML "Position.line 42"}, say:
   580   you replace @{ML Position.none} by @{ML "Position.line 42"}, say:
   581 
   581 
   582 @{ML_response [display,gray]
   582 @{ML_response [display,gray]
   583 "let 
   583 "let 
   584   val input = OuterSyntax.scan (Position.line 42) \"foo\"
   584   val input = OuterSyntax.scan (Position.line 42) \"foo\"
   585 in 
   585 in 
   586   YXML.parse (fst (OuterParse.term input))
   586   YXML.parse (fst (OuterParse.term input))
   587 end"
   587 end"
   588 "XML.Elem (\"token\", [(\"line\", \"42\"), (\"end_line\", \"42\")], [XML.Text \"foo\"])"}
   588 "XML.Elem (\"token\", [(\"line\", \"42\"), (\"end_line\", \"42\")], [XML.Text \"foo\"])"}
   589   
   589   
   590   The positional information is stored so that code called later on will be
   590   The positional information is stored as part of an XML-tree so that code 
   591   able to give more precise error messages. 
   591   called later on will be able to give more precise error messages. 
   592 
   592 
   593   \begin{readmore}
   593   \begin{readmore}
   594   The functions to do with input and output of XML and YXML are defined 
   594   The functions to do with input and output of XML and YXML are defined 
   595   in @{ML_file "Pure/General/xml.ML"} and @{ML_file "Pure/General/yxml.ML"}.
   595   in @{ML_file "Pure/General/xml.ML"} and @{ML_file "Pure/General/yxml.ML"}.
   596   \end{readmore}
   596   \end{readmore}
   690   (bar, SOME \"\\^E\\^Ftoken\\^Enat\\^E\\^F\\^E\", Mixfix (\"BAR\", [], 100)), 
   690   (bar, SOME \"\\^E\\^Ftoken\\^Enat\\^E\\^F\\^E\", Mixfix (\"BAR\", [], 100)), 
   691   (blonk, NONE, NoSyn)],[])"}  
   691   (blonk, NONE, NoSyn)],[])"}  
   692 *}
   692 *}
   693 
   693 
   694 text {*
   694 text {*
   695   Whenever types are given, they are stored in the @{ML SOME}s. Since types
   695   Whenever types are given, they are stored in the @{ML SOME}s. They types are
   696   are part of the inner syntax they are strings with some encoded information 
   696   not yet given to the variable: this must be done by type inference later
   697   (see previous section). 
   697   on. Since types are part of the inner syntax they are strings with some
   698   If a syntax translation is present for a variable, then it is
   698   encoded information (see previous section). If a syntax translation is
   699   stored in the @{ML Mixfix} datastructure; no syntax translation is
   699   present for a variable, then it is stored in the @{ML Mixfix} datastructure;
   700   indicated by @{ML NoSyn}.
   700   no syntax translation is indicated by @{ML NoSyn}.
   701 
   701 
   702   \begin{readmore}
   702   \begin{readmore}
   703   The datastructre for sytax annotations is defined in @{ML_file "Pure/Syntax/mixfix.ML"}.
   703   The datastructre for sytax annotations is defined in @{ML_file "Pure/Syntax/mixfix.ML"}.
   704   \end{readmore}
   704   \end{readmore}
   705 
   705 
   927   \isacommand{definition} and \isacommand{declare}.  In other cases,
   927   \isacommand{definition} and \isacommand{declare}.  In other cases,
   928   commands are expected to parse some arguments, for example a proposition,
   928   commands are expected to parse some arguments, for example a proposition,
   929   and then ``open up'' a proof in order to prove the proposition (for example
   929   and then ``open up'' a proof in order to prove the proposition (for example
   930   \isacommand{lemma}) or prove some other properties (for example
   930   \isacommand{lemma}) or prove some other properties (for example
   931   \isacommand{function}). To achieve this kind of behaviour, you have to use the kind
   931   \isacommand{function}). To achieve this kind of behaviour, you have to use the kind
   932   indicator @{ML thy_goal in OuterKeyword}.
   932   indicator @{ML thy_goal in OuterKeyword}.  Note, however, once you change the 
       
   933   ``kind'' of a command from @{ML thy_decl in OuterKeyword} to @{ML thy_goal in OuterKeyword} 
       
   934   then the keyword file needs to be re-created.
   933 
   935 
   934   Below we change \isacommand{foobar} so that it takes a proposition as
   936   Below we change \isacommand{foobar} so that it takes a proposition as
   935   argument and then starts a proof in order to prove it. Therefore in Line 13, 
   937   argument and then starts a proof in order to prove it. Therefore in Line 13, 
   936   we set the kind indicator to @{ML thy_goal in OuterKeyword}.
   938   we set the kind indicator to @{ML thy_goal in OuterKeyword}.
   937 *}
   939 *}
   979   \isacommand{apply}@{text "(rule conjI)"}\\
   981   \isacommand{apply}@{text "(rule conjI)"}\\
   980   \isacommand{apply}@{text "(rule TrueI)+"}\\
   982   \isacommand{apply}@{text "(rule TrueI)+"}\\
   981   \isacommand{done}
   983   \isacommand{done}
   982   \end{isabelle}
   984   \end{isabelle}
   983 
   985 
   984   However, once you change the ``kind'' of a command from @{ML thy_decl in OuterKeyword}
   986  
   985   to @{ML thy_goal in OuterKeyword} then the keyword file needs to be re-created.
       
   986   
   987   
   987   (FIXME What do @{ML "Toplevel.theory"} 
   988   (FIXME What do @{ML "Toplevel.theory"} 
   988   @{ML "Toplevel.print"} 
   989   @{ML "Toplevel.print"} 
   989   @{ML Toplevel.local_theory} do?)
   990   @{ML Toplevel.local_theory} do?)
   990 
   991