isabelle-cookbook: comparison CookBook/Parsing.thy

equal deleted inserted replaced

-:123401a5c8e9
+:5e309df58557
 end"
 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
 Two parser can be connected in sequence by using the function @{ML "(op --)"}.
 For example parsing @{text "h"}, @{text "e"} and @{text "l"} in this
-sequence can be achieved by
+sequence you can achieve by:
 @{ML_response [display,gray] "(($$ \"h\") -- ($$ \"e\") -- ($$ \"l\")) (explode \"hello\")"
 "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"}
 Note how the result of consumed strings builds up on the left as nested pairs.
-If, as in the previous example, one wants to parse a particular string,
+If, as in the previous example, you want to parse a particular string,
 then one should use the function @{ML Scan.this_string}:
 @{ML_response [display,gray] "Scan.this_string \"hell\" (explode \"hello\")"
 "(\"hell\", [\"o\"])"}
 Parsers that explore alternatives can be constructed using the function @{ML
 "(op ||)"}. For example, the parser @{ML "(p || q)" for p q} returns the
 result of @{text "p"}, in case it succeeds, otherwise it returns the
-result of @{text "q"}. For example
+result of @{text "q"}. For example:
 @{ML_response [display,gray]
 "let
 val hw = ($$ \"h\") || ($$ \"w\")
 end"
 "((\"h\", [\"e\", \"l\", \"l\", \"o\"]), (\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
 The functions @{ML "(op |--)"} and @{ML "(op --|)"} work like the sequencing function
 for parsers, except that they discard the item being parsed by the first (respectively second)
-parser. For example
+parser. For example:
 @{ML_response [display,gray]
 "let
 val just_e = ($$ \"h\") |-- ($$ \"e\")
 val just_h = ($$ \"h\") --| ($$ \"e\")
 in
 (p input1, p input2)
 end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}
 The function @{ML "(op !!)"} helps to produce appropriate error messages
-during parsing. For example if one wants to parse that @{text p} is immediately
+during parsing. For example if you want to parse that @{text p} is immediately
 followed by @{text q}, or start a completely different parser @{text r},
-one might write
+you might write
 @{ML [display,gray] "(p -- q) || r" for p q r}
-However, this parser is problematic for producing an appropriate error message, in case
+However, this parser is problematic for producing an appropriate error
-the parsing of @{ML "(p -- q)" for p q} fails. Because in that case one loses the information
+message, in case the parsing of @{ML "(p -- q)" for p q} fails. Because in
-that @{text p} should be followed by @{text q}. To see this consider the case in
+that case you lose the information that @{text p} should be followed by
-which @{text p}
+@{text q}. To see this consider the case in which @{text p} is present in
-is present in the input, but not @{text q}. That means @{ML "(p -- q)" for p q} will fail
+the input, but not @{text q}. That means @{ML "(p -- q)" for p q} will fail
-and the
+and the alternative parser @{text r} will be tried. However in many
-alternative parser @{text r} will be tried. However in many circumstance this will be the wrong
+circumstance this will be the wrong parser for the input ``p-followed-by-q''
-parser for the input ``p-followed-by-q'' and therefore will also fail. The error message is then
+and therefore will also fail. The error message is then caused by the
-caused by the
+failure of @{text r}, not by the absence of @{text q} in the input. This
-failure of @{text r}, not by the absence of @{text q} in the input. This kind of situation
+kind of situation can be avoided when using the function @{ML "(op
-can be avoided when using the function @{ML "(op !!)"}. This function aborts the whole process of
+!!)"}. This function aborts the whole process of parsing in case of a
-parsing in case of a failure and prints an error message. For example if we invoke the parser
+failure and prints an error message. For example if you invoke the parser
 @{ML [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\"))"}
 on @{text [quotes] "hello"}, the parsing succeeds
 @{ML_response [display,gray]
 "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"hello\")"
 "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"}
-but if we invoke it on @{text [quotes] "world"}
+but if you invoke it on @{text [quotes] "world"}
 @{ML_response_fake [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"world\")"
 "Exception ABORT raised"}
 then the parsing aborts and the error message @{text "foo"} is printed out. In order to
 see the error message, we need to prefix the parser with the function
-@{ML "Scan.error"}. For example
+@{ML "Scan.error"}. For example:
 @{ML_response_fake [display,gray] "Scan.error (!! (fn _ => \"foo\") ($$ \"h\"))"
 "Exception Error \"foo\" raised"}
 This ``prefixing'' is usually done by wrappers such as @{ML "OuterSyntax.command"}
-(FIXME: give reference to later place).
+(see Section~\ref{sec:newcommand} which explains this function in more detail).
-Let us now return to our example of parsing @{ML "(p -- q) || r" for p q r}. If we want
+Let us now return to our example of parsing @{ML "(p -- q) || r" for p q
-to generate the correct error message for p-followed-by-q, then
+r}. If you want to generate the correct error message for p-followed-by-q,
-we have to write:
+then you have to write:
 *}
 ML{*fun p_followed_by_q p q r =
 let
 val err_msg = (fn _ => p ^ " is not followed by " ^ q)
 "((\"w\", \"w\"), [\"o\", \"r\", \"l\", \"d\"])"}
 yields the expected parsing.
 The function @{ML "Scan.repeat p" for p} will apply a parser @{text p} as
-often as it succeeds. For example
+often as it succeeds. For example:
 @{ML_response [display,gray] "Scan.repeat ($$ \"h\") (explode \"hhhhello\")"
 "([\"h\", \"h\", \"h\", \"h\"], [\"e\", \"l\", \"l\", \"o\"])"}
 Note that @{ML "Scan.repeat"} stores the parsed items in a list. The function
 @{ML "Scan.repeat1"} is similar, but requires that the parser @{text "p"}
 succeeds at least once.
 Also note that the parser would have aborted with the exception @{text MORE}, if
-we had run it only on just @{text [quotes] "hhhh"}. This can be avoided by using
+you had run it only on just @{text [quotes] "hhhh"}. This can be avoided by using
 the wrapper @{ML Scan.finite} and the ``stopper-token'' @{ML Symbol.stopper}. With
-them we can write
+them you can write:
 @{ML_response [display,gray] "Scan.finite Symbol.stopper (Scan.repeat ($$ \"h\")) (explode \"hhhh\")"
 "([\"h\", \"h\", \"h\", \"h\"], [])"}
 @{ML Symbol.stopper} is the ``end-of-input'' indicator for parsing strings;
-other stoppers need to be used when parsing token, for example. However, this kind of
+other stoppers need to be used when parsing tokens, for example. However, this kind of
 manually wrapping is often already done by the surrounding infrastructure.
 The function @{ML Scan.repeat} can be used with @{ML Scan.one} to read any
 string as in
 Scan.finite Symbol.stopper p input2)
 end"
 "(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []),
 ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"}
-After parsing is done, one nearly always wants to apply a function on the parsed
+After parsing is done, you nearly always want to apply a function on the parsed
 items. To do this the function @{ML "(p >> f)" for p f} can be employed, which runs
 first the parser @{text p} and upon successful completion applies the
 function @{text f} to the result. For example
 @{ML_response [display,gray]
 "let
-fun double (x,y) = (x^x,y^y)
+fun double (x,y) = (x ^ x, y ^ y)
 in
 (($$ \"h\") -- ($$ \"e\") >> double) (explode \"hello\")
 end"
 "((\"hh\", \"ee\"), [\"l\", \"l\", \"o\"])"}
 \begin{exercise}\label{ex:scancmts}
 Write a parser that parses an input string so that any comment enclosed
 inside @{text "(*\<dots>*)"} is replaced by a the same comment but enclosed inside
 @{text "(**\<dots>**)"} in the output string. To enclose a string, you can use the
 function @{ML "enclose s1 s2 s" for s1 s2 s} which produces the string @{ML
-"s1^s^s2" for s1 s2 s}.
+"s1 ^ s ^ s2" for s1 s2 s}.
 \end{exercise}
 The function @{ML Scan.lift} takes a parser and a pair as arguments. This function applies
 the given parser to the second component of the pair and leaves the  first component
 *}
 section {* Parsing Theory Syntax *}
 text {*
-Most of the time, however, Isabelle developers have to deal with parsing tokens, not strings.
+Most of the time, however, Isabelle developers have to deal with parsing
-This is because the parsers for the theory syntax, as well as the parsers for the
+tokens, not strings.  This is because the parsers for the theory syntax, as
-argument syntax of proof methods and attributes use the type
+well as the parsers for the argument syntax of proof methods and attributes
-@{ML_type OuterLex.token} (which is identical to the type @{ML_type OuterParse.token}).
+use the type @{ML_type OuterLex.token} (which is identical to the type
-There are also parsers for ML-expressions and ML-files.
+@{ML_type OuterParse.token}).  There are also parsers for ML-expressions and
+ML-files, which can be sometimes handy.
 \begin{readmore}
 The parser functions for the theory syntax are contained in the structure
 @{ML_struct OuterParse} defined in the file @{ML_file  "Pure/Isar/outer_parse.ML"}.
 The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}.
 @{ML "Command" in OuterLex} for commands). Some token parsers take into account the
 kind of tokens.
 *}
 text {*
-For the examples below, we can generate a token list out of a string using
+With the examples below, you can generate a token list out of a string using
-the function @{ML "OuterSyntax.scan"}, which we give below @{ML
+the function @{ML "OuterSyntax.scan"}, which is given @{ML "Position.none"}
-"Position.none"} as argument since, at the moment, we are not interested in
+as argument since, at the moment, we are not interested in generating
-generating precise error messages. The following code
+precise error messages. The following code
 @{ML_response_fake [display,gray] "OuterSyntax.scan Position.none \"hello world\""
 "[Token (\<dots>,(Ident, \"hello\"),\<dots>),
 Token (\<dots>,(Space, \" \"),\<dots>),
 Token (\<dots>,(Ident, \"world\"),\<dots>)]"}
 produces three tokens where the first and the last are identifiers, since
 @{text [quotes] "hello"} and @{text [quotes] "world"} do not match any
 other syntactic category.\footnote{Note that because of a possible a bug in
-the PolyML runtime system the result is printed as @{text "?"}, instead of
+the PolyML runtime system the result is printed as @{text [quotes] "?"}, instead of
 the tokens.} The second indicates a space.
 Many parsing functions later on will require spaces, comments and the like
 to have already been filtered out.  So from now on we are going to use the
-functions @{ML filter} and @{ML OuterLex.is_proper} do this. For example
+functions @{ML filter} and @{ML OuterLex.is_proper} do this. For example:
 @{ML_response_fake [display,gray]
 "let
 val input = OuterSyntax.scan Position.none \"hello world\"
 in
 filter OuterLex.is_proper input
 end"
 "[Token (\<dots>,(Ident, \"hello\"), \<dots>), Token (\<dots>,(Ident, \"world\"), \<dots>)]"}
-For convenience we define the function
+For convenience we define the function:
 *}
 ML{*fun filtered_input str =
 filter OuterLex.is_proper (OuterSyntax.scan Position.none str) *}
 text {*
-If we parse
+If you now parse
 @{ML_response_fake [display,gray]
 "filtered_input \"inductive | for\""
 "[Token (\<dots>,(Command, \"inductive\"),\<dots>),
 Token (\<dots>,(Keyword, \"|\"),\<dots>),
 Token (\<dots>,(Keyword, \"for\"),\<dots>)]"}
-we obtain a list consisting of only a command and two keyword tokens.
+you obtain a list consisting of only a command and two keyword tokens.
 If you want to see which keywords and commands are currently known, type in
 the following code (you might have to adjust the @{ML print_depth} in order to
 see the complete list):
 @{ML_response_fake [display,gray]
 in
 (Scan.dest_lexicon commands, Scan.dest_lexicon keywords)
 end"
 "([\"}\",\"{\",\<dots>],[\"\<rightleftharpoons>\",\"\<leftharpoondown>\",\<dots>])"}
-Now the parser @{ML "OuterParse.$$$"} parses a single keyword. For example
+The parser @{ML "OuterParse.$$$"} parses a single keyword. For example:
 @{ML_response [display,gray]
 "let
 val input1 = filtered_input \"where for\"
 val input2 = filtered_input \"| in\"
 in
 (OuterParse.$$$ \"where\" input1, OuterParse.$$$ \"|\" input2)
 end"
 "((\"where\",\<dots>),(\"|\",\<dots>))"}
-Like before, we can sequentially connect parsers with @{ML "(op --)"}. For example
+Like before, you can sequentially connect parsers with @{ML "(op --)"}. For example:
 @{ML_response [display,gray]
 "let
 val input = filtered_input \"| in\"
 in
 end"
 "((\"|\",\"in\"),[])"}
 The parser @{ML "OuterParse.enum s p" for s p} parses a possibly empty
 list of items recognised by the parser @{text p}, where the items being parsed
-are separated by the string @{text s}. For example
+are separated by the string @{text s}. For example:
 @{ML_response [display,gray]
 "let
 val input = filtered_input \"in | in | in foo\"
 in
 (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input
 end"
 "([\"in\",\"in\",\"in\"],[\<dots>])"}
 @{ML "OuterParse.enum1"} works similarly, except that the parsed list must
-be non-empty. Note that we had to add a string @{text [quotes] "foo"} at the end
+be non-empty. Note that we had to add a string @{text [quotes] "foo"} at the
-of the parsed string, otherwise the parser would have consumed all tokens
+end of the parsed string, otherwise the parser would have consumed all
-and then failed with the exception @{text "MORE"}. Like in the previous
+tokens and then failed with the exception @{text "MORE"}. Like in the
-section, we can avoid this exception using the wrapper @{ML
+previous section, we can avoid this exception using the wrapper @{ML
 Scan.finite}. This time, however, we have to use the ``stopper-token'' @{ML
-OuterLex.stopper}. We can write
+OuterLex.stopper}. We can write:
 @{ML_response [display,gray]
 "let
 val input = filtered_input \"in | in | in\"
 in
 text {* (FIXME funny output for a proposition) *}
 section {* Parsing Specifications *}
 text {*
-There are a number of special purpose parsers that help for parsing specifications.
+There are a number of special purpose parsers that help with parsing specifications
-For example the function @{ML OuterParse.target} reads a target in order to indicate
+of functions, inductive definitions and so on. For example the function
-a locale.
+@{ML OuterParse.target} reads a target in order to indicate a locale.
 @{ML_response [display,gray]
 "let
 val input = filtered_input \"(in test)\"
 in
 parse OuterParse.target input
 end" "(\"test\",[])"}
-The function @{ML OuterParse.opt_target} makes this parser ``optional''.
+The function @{ML OuterParse.opt_target} makes this parser ``optional'', that
+is wrapping the result into an option type and returning @{ML NONE} if no
+target is present.
 The function @{ML OuterParse.fixes} reads a list of constants
-which can include a type annotation and a syntax translation.
+that can include type annotations and syntax translations.
-The list needs to be separated by \isacommand{and}.
+The list is separated by \isacommand{and}. For example:
 *}
 text {*
 @{ML_response [display,gray]
 end"
 "([(foo, SOME \<dots>, Mixfix (\"FOO\",[100],100)),
 (bar, SOME \<dots>, NoSyn),
 (blonk, NONE, NoSyn)],[])"}
-The types of the constants is stored in the @{ML SOME}s.
+Whenever types are given, then they are stored in the @{ML SOME}s.
-We need to write @{text "\\\"int \<Rightarrow> bool\\\""} to properly
+Note that we had to write @{text "\\\"int \<Rightarrow> bool\\\""} to properly escape
-escape the type.
+the double quotes in the compound type.
 @{ML OuterParse.for_fixes} is an ``optional'' that prefixes
 @{ML OuterParse.fixes} with the comman \isacommand{for}.
-*}
+(FIXME give an example and explain more)
+@{ML_response [display,gray]
+"let
+val input = filtered_input
+\"for foo::\\\"int \<Rightarrow> bool\\\" (\\\"FOO\\\" [100] 100) and bar::nat and blonk\"
+in
+parse OuterParse.for_fixes input
+end"
+"([(foo, SOME \<dots>, Mixfix (\"FOO\",[100],100)),
+(bar, SOME \<dots>, NoSyn),
+(blonk, NONE, NoSyn)],[])"}
+*}
 ML{*let
 val input = filtered_input "test_lemma[intro,foo,elim,dest!,bar]:"
 in
 parse (SpecParse.thm_name ":") input
 a list of predicate constants with optional type-annotation and
 optional syntax-annotation; a list of for-fixes (fixed parameters);
 and a list of rules where each rule has optionally a name and an attribute.
 *}
-section {* New Commands and Keyword Files *}
+section {* New Commands and Keyword Files\label{sec:newcommand} *}
 text {*
 Often new commands, for example for providing new definitional principles,
 need to be implemented. While this is not difficult on the ML-level,
 new commands, in order to be useful, need to be recognised by
 ProofGeneral. This results in some subtle configuration issues, which we
 will explain in this section.
 To keep things simple, let us start with a ``silly'' command that does nothing
 at all. We shall name this command \isacommand{foobar}. On the ML-level it can be
-defined as
+defined as:
 *}
 ML{*let
 val do_nothing = Scan.succeed (Toplevel.theory I)
 val kind = OuterKeyword.thy_decl
 \isacommand{foobar}-command. If you target other logics besides HOL, such
 as Nominal or ZF, then you need to adapt the log files appropriately.
 @{text Pure} and @{text HOL} are usually compiled during the installation of
 Isabelle. So log files for them should be already available. If not, then
 they can be conveniently compiled with the help of the build-script from the Isabelle
-distribution
+distribution.
 @{text [display]
 "$ ./build -m \"Pure\"
 $ ./build -m \"HOL\""}
-The @{text "Pure-ProofGeneral"} theory needs to be compiled with
+The @{text "Pure-ProofGeneral"} theory needs to be compiled with:
 @{text [display] "$ ./build -m \"Pure-ProofGeneral\" \"Pure\""}
 For the theory @{text "Command.thy"}, you first need to create a ``managed'' subdirectory
-with
+with:
 @{text [display] "$ isabelle mkdir FoobarCommand"}
-This generates a directory containing the files
+This generates a directory containing the files:
 @{text [display]
 "./IsaMakefile
 ./FoobarCommand/ROOT.ML
 ./FoobarCommand/document
 You need to copy the file @{text "Command.thy"} into the directory @{text "FoobarCommand"}
 and add the line
 @{text [display] "use_thy \"Command\";"}
-to the file @{text "./FoobarCommand/ROOT.ML"}. You can now compile the theory by just typing
+to the file @{text "./FoobarCommand/ROOT.ML"}. You can now compile the theory by just typing:
 @{text [display] "$ isabelle make"}
 If the compilation succeeds, you have finally created all the necessary log files.
 They are stored in the directory
 or something similar depending on your Isabelle distribution and architecture.
 One quick way to assign a shell variable to this directory is by typing
 @{text [display] "$ ISABELLE_LOGS=\"$(isabelle getenv -b ISABELLE_OUTPUT)\"/log"}
-on the Unix prompt. The directory should include the files
+on the Unix prompt. The directory should include the files:
 @{text [display]
 "Pure.gz
 HOL.gz
 Pure-ProofGeneral.gz
 the string @{text "foobar"} twice.\footnote{To see whether things are fine, check
 that @{text "grep foobar"} on this file returns something
 non-empty.}  This keyword file needs to
 be copied into the directory @{text "~/.isabelle/etc"}. To make Isabelle
 aware of this keyword file, you have to start Isabelle with the option @{text
-"-k foobar"}, i.e.
+"-k foobar"}, that is:
-@{text [display] "$ isabelle -k foobar a_theory_file"}
+@{text [display] "$ isabelle emacs -k foobar a_theory_file"}
 If you now build a theory on top of @{text "Command.thy"},
 then the command \isacommand{foobar} can be used.
 Similarly with any other new command.
 At the moment \isacommand{foobar} is not very useful. Let us refine it a bit
-next by taking a proposition as argument and printing this proposition inside
+next by letting it take a proposition as argument and printing this proposition
-the tracing buffer.
+inside the tracing buffer.
 The crucial part of a command is the function that determines the behaviour
 of the command. In the code above we used a ``do-nothing''-function, which
 because of @{ML Scan.succeed} does not parse any argument, but immediately
 returns the simple toplevel function @{ML "Toplevel.theory I"}. We can
 replace this code by a function that first parses a proposition (using the
 parser @{ML OuterParse.prop}), then prints out the tracing
 information (using a new top-level function @{text trace_top_lvl}) and
-finally does nothing. For this we can write
+finally does nothing. For this you can write:
 *}
 ML{*let
 fun trace_top_lvl str =
 Toplevel.theory (fn thy => (tracing str; thy))
 in
 OuterSyntax.command "foobar" "traces a proposition" kind trace_prop
 end *}
 text {*
-Now we can type
+Now you can type
 \begin{isabelle}
 \isacommand{foobar}~@{text [quotes] "True \<and> False"}\\
 @{text "> \"True \<and> False\""}
 \end{isabelle}
 \isacommand{lemma}) or prove some other properties (for example
 \isacommand{function}). To achieve this kind of behaviour, we have to use the kind
 indicator @{ML thy_goal in OuterKeyword}.
 Below we change \isacommand{foobar} so that it takes a proposition as
-argument and then starts a proof in order to prove it. Therefore in Line 13
+argument and then starts a proof in order to prove it. Therefore in Line 13,
-below, we set the kind indicator to @{ML thy_goal in OuterKeyword}.
+we set the kind indicator to @{ML thy_goal in OuterKeyword}.
 *}
 ML%linenumbers{*let
 fun set_up_thm str ctxt =
 let
 OuterSyntax.command "foobar" "proving a proposition" kind prove_prop
 end *}
 text {*
 The function @{text set_up_thm} in Lines 2 to 7 takes a string (the proposition to be
-proved) and a context.  The context is necessary in order to be able to use
+proved) and a context as argument.  The context is necessary in order to be able to use
 @{ML Syntax.read_prop}, which converts a string into a proper proposition.
 In Line 6 the function @{ML Proof.theorem_i} starts the proof for the
 proposition. Its argument @{ML NONE} stands for a locale (which we chose to
 omit); the argument @{ML "(K I)"} stands for a function that determines what
 should be done with the theorem once it is proved (we chose to just forget
-about it). In Lines 9 to 11 contain the parser for the proposition.
+about it). Lines 9 to 11 contain the parser for the proposition.
-If we now type \isacommand{foobar}~@{text [quotes] "True \<and> True"}, we obtain the following
+(FIXME: explain @{ML Toplevel.print} etc)
+If you now type \isacommand{foobar}~@{text [quotes] "True \<and> True"}, you obtain the following
 proof state:
 \begin{isabelle}
 \isacommand{foobar}~@{text [quotes] "True \<and> True"}\\
-@{text "goal (1 subgoal)"}\\
+@{text "goal (1 subgoal):"}\\
 @{text "1. True \<and> True"}
 \end{isabelle}
-and we can build the proof
+and you can build the proof
 \begin{isabelle}
 \isacommand{foobar}~@{text [quotes] "True \<and> True"}\\
 \isacommand{apply}@{text "(rule conjI)"}\\
 \isacommand{apply}@{text "(rule TrueI)+"}\\
 (FIXME What do @{ML "Toplevel.theory"}
 @{ML "Toplevel.print"}
 @{ML Toplevel.local_theory}?)
 (FIXME read a name and show how to store theorems)
-(FIXME possibly also read a locale)
 *}
 (*<*)
 chapter {* Parsing *}

changeset 102	5e309df58557
parent 101	123401a5c8e9
child 104	5dcad9348e4d