diff -r 90189a97b3f8 -r ebbd0dd008c8 CookBook/Package/Ind_Interface.thy --- a/CookBook/Package/Ind_Interface.thy Wed Jan 28 06:43:51 2009 +0000 +++ b/CookBook/Package/Ind_Interface.thy Thu Jan 29 09:46:17 2009 +0000 @@ -2,76 +2,82 @@ imports "../Base" Simple_Inductive_Package begin -(*<*) -ML {* -structure SIP = SimpleInductivePackage -*} -(*>*) - -section{* The interface *} +section{* The Interface \label{sec:ind-interface} *} text {* -\label{sec:ind-interface} -In order to add a new inductive predicate to a theory with the help of our package, the user -must \emph{invoke} it. For every package, there are essentially two different ways of invoking -it, which we will refer to as \emph{external} and \emph{internal}. By external -invocation we mean that the package is called from within a theory document. In this case, -the type of the inductive predicate, as well as its introduction rules, are given as strings -by the user. Before the package can actually make the definition, the type and introduction -rules have to be parsed. In contrast, internal invocation means that the package is called -by some other package. For example, the function definition package \cite{Krauss-IJCAR06} -calls the inductive definition package to define the graph of the function. However, it is -not a good idea for the function definition package to pass the introduction rules for the -function graph to the inductive definition package as strings. In this case, it is better -to directly pass the rules to the package as a list of terms, which is more robust than -handling strings that are lacking the additional structure of terms. These two ways of -invoking the package are reflected in its ML programming interface, which consists of two -functions: -@{ML_chunk [display] SIMPLE_INDUCTIVE_PACKAGE} + In order to add a new inductive predicate to a theory with the help of our + package, the user must \emph{invoke} it. For every package, there are + essentially two different ways of invoking it, which we will refer to as + \emph{external} and \emph{internal}. By external invocation we mean that the + package is called from within a theory document. In this case, the type of + the inductive predicate, as well as its introduction rules, are given as + strings by the user. Before the package can actually make the definition, + the type and introduction rules have to be parsed. In contrast, internal + invocation means that the package is called by some other package. For + example, the function definition package \cite{Krauss-IJCAR06} calls the + inductive definition package to define the graph of the function. However, + it is not a good idea for the function definition package to pass the + introduction rules for the function graph to the inductive definition + package as strings. In this case, it is better to directly pass the rules to + the package as a list of terms, which is more robust than handling strings + that are lacking the additional structure of terms. These two ways of + invoking the package are reflected in its ML programming interface, which + consists of two functions: + + @{ML_chunk [display] SIMPLE_INDUCTIVE_PACKAGE} *} text {* -The function for external invocation of the package is called @{ML add_inductive in SIP}, -whereas the one for internal invocation is called @{ML add_inductive_i in SIP}. Both -of these functions take as arguments the names and types of the inductive predicates, the -names and types of their parameters, the actual introduction rules and a \emph{local theory}. -They return a local theory containing the definition, together with a tuple containing -the introduction and induction rules, which are stored in the local theory, too. -In contrast to an ordinary theory, which simply consists of a type signature, as -well as tables for constants, axioms and theorems, a local theory also contains -additional context information, such as locally fixed variables and local assumptions -that may be used by the package. The type @{ML_type local_theory} is identical to the -type of \emph{proof contexts} @{ML_type "Proof.context"}, although not every proof context -constitutes a valid local theory. -Note that @{ML add_inductive_i in SIP} expects the types -of the predicates and parameters to be specified using the datatype @{ML_type typ} of Isabelle's -logical framework, whereas @{ML add_inductive in SIP} -expects them to be given as optional strings. If no string is -given for a particular predicate or parameter, this means that the type should be -inferred by the package. Additional \emph{mixfix syntax} may be associated with -the predicates and parameters as well. Note that @{ML add_inductive_i in SIP} does not -allow mixfix syntax to be associated with parameters, since it can only be used -for parsing. The names of the predicates, parameters and rules are represented by the -type @{ML_type Binding.binding}. Strings can be turned into elements of the type -@{ML_type Binding.binding} using the function -@{ML [display] "Binding.name : string -> Binding.binding"} -Each introduction rule is given as a tuple containing its name, a list of \emph{attributes} -and a logical formula. Note that the type @{ML_type Attrib.binding} used in the list of -introduction rules is just a shorthand for the type @{ML_type "Binding.binding * Attrib.src list"}. -The function @{ML add_inductive_i in SIP} expects the formula to be specified using the datatype -@{ML_type term}, whereas @{ML add_inductive in SIP} expects it to be given as a string. -An attribute specifies additional actions and transformations that should be applied to -a theorem, such as storing it in the rule databases used by automatic tactics -like the simplifier. The code of the package, which will be described in the following -section, will mostly treat attributes as a black box and just forward them to other -functions for storing theorems in local theories. -The implementation of the function @{ML add_inductive in SIP} for external invocation -of the package is quite simple. Essentially, it just parses the introduction rules -and then passes them on to @{ML add_inductive_i in SIP}: -@{ML_chunk [display] add_inductive} -For parsing and type checking the introduction rules, we use the function -@{ML [display] "Specification.read_specification: + The function for external invocation of the package is called @{ML + add_inductive in SimpleInductivePackage}, whereas the one for internal + invocation is called @{ML add_inductive_i in SimpleInductivePackage}. Both + of these functions take as arguments the names and types of the inductive + predicates, the names and types of their parameters, the actual introduction + rules and a \emph{local theory}. They return a local theory containing the + definition, together with a tuple containing the introduction and induction + rules, which are stored in the local theory, too. In contrast to an + ordinary theory, which simply consists of a type signature, as well as + tables for constants, axioms and theorems, a local theory also contains + additional context information, such as locally fixed variables and local + assumptions that may be used by the package. The type @{ML_type + local_theory} is identical to the type of \emph{proof contexts} @{ML_type + "Proof.context"}, although not every proof context constitutes a valid local + theory. Note that @{ML add_inductive_i in SimpleInductivePackage} expects + the types of the predicates and parameters to be specified using the + datatype @{ML_type typ} of Isabelle's logical framework, whereas @{ML + add_inductive in SimpleInductivePackage} expects them to be given as + optional strings. If no string is given for a particular predicate or + parameter, this means that the type should be inferred by the + package. Additional \emph{mixfix syntax} may be associated with the + predicates and parameters as well. Note that @{ML add_inductive_i in + SimpleInductivePackage} does not allow mixfix syntax to be associated with + parameters, since it can only be used for parsing. The names of the + predicates, parameters and rules are represented by the type @{ML_type + Binding.binding}. Strings can be turned into elements of the type @{ML_type + Binding.binding} using the function @{ML [display] "Binding.name : string -> + Binding.binding"} Each introduction rule is given as a tuple containing its + name, a list of \emph{attributes} and a logical formula. Note that the type + @{ML_type Attrib.binding} used in the list of introduction rules is just a + shorthand for the type @{ML_type "Binding.binding * Attrib.src list"}. The + function @{ML add_inductive_i in SimpleInductivePackage} expects the formula + to be specified using the datatype @{ML_type term}, whereas @{ML + add_inductive in SimpleInductivePackage} expects it to be given as a string. + An attribute specifies additional actions and transformations that should be + applied to a theorem, such as storing it in the rule databases used by + automatic tactics like the simplifier. The code of the package, which will + be described in the following section, will mostly treat attributes as a + black box and just forward them to other functions for storing theorems in + local theories. The implementation of the function @{ML add_inductive in + SimpleInductivePackage} for external invocation of the package is quite + simple. Essentially, it just parses the introduction rules and then passes + them on to @{ML add_inductive_i in SimpleInductivePackage}: + + @{ML_chunk [display] add_inductive} + + For parsing and type checking the introduction rules, we use the function + + @{ML [display] "Specification.read_specification: (Binding.binding * string option * mixfix) list -> (*{variables}*) (Attrib.binding * string list) list list -> (*{rules}*) local_theory -> @@ -81,32 +87,33 @@ *} text {* -During parsing, both predicates and parameters are treated as variables, so -the lists \verb!preds_syn! and \verb!params_syn! are just appended -before being passed to @{ML read_specification in Specification}. Note that the format -for rules supported by @{ML read_specification in Specification} is more general than -what is required for our package. It allows several rules to be associated -with one name, and the list of rules can be partitioned into several -sublists. In order for the list \verb!intro_srcs! of introduction rules -to be acceptable as an input for @{ML read_specification in Specification}, we first -have to turn it into a list of singleton lists. This transformation -has to be reversed later on by applying the function -@{ML [display] "the_single: 'a list -> 'a"} -to the list \verb!specs! containing the parsed introduction rules. -The function @{ML read_specification in Specification} also returns the list \verb!vars! -of predicates and parameters that contains the inferred types as well. -This list has to be chopped into the two lists \verb!preds_syn'! and -\verb!params_syn'! for predicates and parameters, respectively. -All variables occurring in a rule but not in the list of variables passed to -@{ML read_specification in Specification} will be bound by a meta-level universal -quantifier. + During parsing, both predicates and parameters are treated as variables, so + the lists \verb!preds_syn! and \verb!params_syn! are just appended + before being passed to @{ML read_specification in Specification}. Note that the format + for rules supported by @{ML read_specification in Specification} is more general than + what is required for our package. It allows several rules to be associated + with one name, and the list of rules can be partitioned into several + sublists. In order for the list \verb!intro_srcs! of introduction rules + to be acceptable as an input for @{ML read_specification in Specification}, we first + have to turn it into a list of singleton lists. This transformation + has to be reversed later on by applying the function + @{ML [display] "the_single: 'a list -> 'a"} + to the list \verb!specs! containing the parsed introduction rules. + The function @{ML read_specification in Specification} also returns the list \verb!vars! + of predicates and parameters that contains the inferred types as well. + This list has to be chopped into the two lists \verb!preds_syn'! and + \verb!params_syn'! for predicates and parameters, respectively. + All variables occurring in a rule but not in the list of variables passed to + @{ML read_specification in Specification} will be bound by a meta-level universal + quantifier. *} text {* -Finally, @{ML read_specification in Specification} also returns another local theory, -but we can safely discard it. As an example, let us look at how we can use this -function to parse the introduction rules of the @{text trcl} predicate: -@{ML_response [display] + Finally, @{ML read_specification in Specification} also returns another local theory, + but we can safely discard it. As an example, let us look at how we can use this + function to parse the introduction rules of the @{text trcl} predicate: + + @{ML_response [display] "Specification.read_specification [(Binding.name \"trcl\", NONE, NoSyn), (Binding.name \"r\", SOME \"'a \ 'a \ bool\", NoSyn)] @@ -129,27 +136,30 @@ \) : (((Binding.binding * typ) * mixfix) list * (Attrib.binding * term list) list) * local_theory"} -In the list of variables passed to @{ML read_specification in Specification}, we have -used the mixfix annotation @{ML NoSyn} to indicate that we do not want to associate any -mixfix syntax with the variable. Moreover, we have only specified the type of \texttt{r}, -whereas the type of \texttt{trcl} is computed using type inference. -The local variables \texttt{x}, \texttt{y} and \texttt{z} of the introduction rules -are turned into bound variables with the de Bruijn indices, -whereas \texttt{trcl} and \texttt{r} remain free variables. + + In the list of variables passed to @{ML read_specification in Specification}, we have + used the mixfix annotation @{ML NoSyn} to indicate that we do not want to associate any + mixfix syntax with the variable. Moreover, we have only specified the type of \texttt{r}, + whereas the type of \texttt{trcl} is computed using type inference. + The local variables \texttt{x}, \texttt{y} and \texttt{z} of the introduction rules + are turned into bound variables with the de Bruijn indices, + whereas \texttt{trcl} and \texttt{r} remain free variables. + *} text {* -\paragraph{Parsers for theory syntax} + + \paragraph{Parsers for theory syntax} -Although the function @{ML add_inductive in SIP} parses terms and types, it still -cannot be used to invoke the package directly from within a theory document. -In order to do this, we have to write another parser. Before we describe -the process of writing parsers for theory syntax in more detail, we first -show some examples of how we would like to use the inductive definition -package. + Although the function @{ML add_inductive in SimpleInductivePackage} parses terms and types, it still + cannot be used to invoke the package directly from within a theory document. + In order to do this, we have to write another parser. Before we describe + the process of writing parsers for theory syntax in more detail, we first + show some examples of how we would like to use the inductive definition + package. -\noindent -The definition of the transitive closure should look as follows: + + The definition of the transitive closure should look as follows: *} simple_inductive @@ -188,10 +198,7 @@ qed (*>*) -text {* -\noindent -Even and odd numbers can be defined by -*} +text {* Even and odd numbers can be defined by *} simple_inductive even and odd @@ -208,10 +215,7 @@ thm even_odd.intros (*>*) -text {* -\noindent -The accessible part of a relation can be introduced as follows: -*} +text {* The accessible part of a relation can be introduced as follows: *} simple_inductive accpart for r :: "'a \ 'a \ bool" @@ -224,9 +228,8 @@ (*>*) text {* -\noindent -Moreover, it should also be possible to define the accessible part -inside a locale fixing the relation @{text r}: + Moreover, it should also be possible to define the accessible part + inside a locale fixing the relation @{text r}: *} locale rel = @@ -247,8 +250,7 @@ thm rel.accpartI' thm rel.accpart'.induct -ML {* -val (result, lthy) = SimpleInductivePackage.add_inductive +ML{*val (result, lthy) = SimpleInductivePackage.add_inductive [(Binding.name "trcl'", NONE, NoSyn)] [(Binding.name "r", SOME "'a \ 'a \ bool", NoSyn)] [((Binding.name "base", []), "\x. trcl' r x x"), ((Binding.name "step", []), "\x y z. trcl' r x y \ r y z \ trcl' r x z")] (TheoryTarget.init NONE @{theory}) @@ -256,218 +258,229 @@ (*>*) text {* -\noindent -In this context, it is important to note that Isabelle distinguishes -between \emph{outer} and \emph{inner} syntax. Theory commands such as -\isa{\isacommand{simple{\isacharunderscore}inductive} $\ldots$ \isacommand{for} $\ldots$ \isacommand{where} $\ldots$} -belong to the outer syntax, whereas items in quotation marks, in particular -terms such as @{text [source] "trcl r x x"} and types such as -@{text [source] "'a \ 'a \ bool"} belong to the inner syntax. -Separating the two layers of outer and inner syntax greatly simplifies -matters, because the parser for terms and types does not have to know -anything about the possible syntax of theory commands, and the parser -for theory commands need not be concerned about the syntactic structure -of terms and types. -\medskip -\noindent -The syntax of the \isa{\isacommand{simple{\isacharunderscore}inductive}} command -can be described by the following railroad diagram: -\begin{rail} + In this context, it is important to note that Isabelle distinguishes + between \emph{outer} and \emph{inner} syntax. Theory commands such as + \isa{\isacommand{simple{\isacharunderscore}inductive} $\ldots$ \isacommand{for} $\ldots$ \isacommand{where} $\ldots$} + belong to the outer syntax, whereas items in quotation marks, in particular + terms such as @{text [source] "trcl r x x"} and types such as + @{text [source] "'a \ 'a \ bool"} belong to the inner syntax. + Separating the two layers of outer and inner syntax greatly simplifies + matters, because the parser for terms and types does not have to know + anything about the possible syntax of theory commands, and the parser + for theory commands need not be concerned about the syntactic structure + of terms and types. + + \medskip + \noindent + The syntax of the \isa{\isacommand{simple{\isacharunderscore}inductive}} command + can be described by the following railroad diagram: + \begin{rail} 'simple\_inductive' target? fixes ('for' fixes)? \\ ('where' (thmdecl? prop + '|'))? ; -\end{rail} - -\paragraph{Functional parsers} + \end{rail} -For parsing terms and types, Isabelle uses a rather general and sophisticated -algorithm due to Earley, which is driven by \emph{priority grammars}. -In contrast, parsers for theory syntax are built up using a set of combinators. -Functional parsing using combinators is a well-established technique, which -has been described by many authors, including Paulson \cite{paulson-ML-91} -and Wadler \cite{Wadler-AFP95}. -The central idea is that a parser is a function of type @{ML_type "'a list -> 'b * 'a list"}, -where @{ML_type "'a"} is a type of \emph{tokens}, and @{ML_type "'b"} is a type for -encoding items that the parser has recognized. When a parser is applied to a -list of tokens whose prefix it can recognize, it returns an encoding of the -prefix as an element of type @{ML_type "'b"}, together with the suffix of the list -containing the remaining tokens. Otherwise, the parser raises an exception -indicating a syntax error. The library for writing functional parsers in -Isabelle can roughly be split up into two parts. The first part consists of a -collection of generic parser combinators that are contained in the structure -@{ML_struct Scan} defined in the file @{ML_file "Pure/General/scan.ML"} in the Isabelle -sources. While these combinators do not make any assumptions about the concrete -structure of the tokens used, the second part of the library consists of combinators -for dealing with specific token types. -The following is an excerpt from the signature of @{ML_struct Scan}: + \paragraph{Functional parsers} -\begin{table} -@{ML "|| : ('a -> 'b) * ('a -> 'b) -> 'a -> 'b"} \\ -@{ML "-- : ('a -> 'b * 'c) * ('c -> 'd * 'e) -> 'a -> ('b * 'd) * 'e"} \\ -@{ML "|-- : ('a -> 'b * 'c) * ('c -> 'd * 'e) -> 'a -> 'd * 'e"} \\ -@{ML "--| : ('a -> 'b * 'c) * ('c -> 'd * 'e) -> 'a -> 'b * 'e"} \\ -@{ML "optional: ('a -> 'b * 'a) -> 'b -> 'a -> 'b * 'a" in Scan} \\ -@{ML "repeat: ('a -> 'b * 'a) -> 'a -> 'b list * 'a" in Scan} \\ -@{ML "repeat1: ('a -> 'b * 'a) -> 'a -> 'b list * 'a" in Scan} \\ -@{ML ">> : ('a -> 'b * 'c) * ('b -> 'd) -> 'a -> 'd * 'c"} \\ -@{ML "!! : ('a * string option -> string) -> ('a -> 'b) -> 'a -> 'b"} -\end{table} -Interestingly, the functions shown above are so generic that they do not -even rely on the input and output of the parser being a list of tokens. -If \texttt{p} succeeds, i.e.\ does not raise an exception, the parser -@{ML "p || q" for p q} returns the result of \texttt{p}, otherwise it returns -the result of \texttt{q}. The parser @{ML "p -- q" for p q} first parses an -item of type @{ML_type "'b"} using \texttt{p}, then passes the remaining tokens -of type @{ML_type "'c"} to \texttt{q}, which parses an item of type @{ML_type "'d"} -and returns the remaining tokens of type @{ML_type "'e"}, which are finally -returned together with a pair of type @{ML_type "'b * 'd"} containing the two -parsed items. The parsers @{ML "p |-- q" for p q} and @{ML "p --| q" for p q} -work in a similar way as the previous one, with the difference that they -discard the item parsed by the first and the second parser, respectively. -If \texttt{p} succeeds, the parser @{ML "optional p x" for p x in Scan} returns the result -of \texttt{p}, otherwise it returns the default value \texttt{x}. The parser -@{ML "repeat p" for p in Scan} applies \texttt{p} as often as it can, returning a possibly -empty list of parsed items. The parser @{ML "repeat1 p" for p in Scan} is similar, -but requires \texttt{p} to succeed at least once. The parser -@{ML "p >> f" for p f} uses \texttt{p} to parse an item of type @{ML_type "'b"}, to which -it applies the function \texttt{f} yielding a value of type @{ML_type "'d"}, which -is returned together with the remaining tokens of type @{ML_type "'c"}. -Finally, @{ML "!!"} is used for transforming exceptions produced by parsers. -If \texttt{p} raises an exception indicating that it cannot parse a given input, -then an enclosing parser such as -@{ML [display] "q -- p || r" for p q r} -will try the alternative parser \texttt{r}. By writing -@{ML [display] "q -- !! err p || r" for err p q r} -instead, one can achieve that a failure of \texttt{p} causes the whole parser to abort. -The @{ML "!!"} operator is similar to the \emph{cut} operator in Prolog, which prevents -the interpreter from backtracking. The \texttt{err} function supplied as an argument -to @{ML "!!"} can be used to produce an error message depending on the current -state of the parser, as well as the optional error message returned by \texttt{p}. + For parsing terms and types, Isabelle uses a rather general and sophisticated + algorithm due to Earley, which is driven by \emph{priority grammars}. + In contrast, parsers for theory syntax are built up using a set of combinators. + Functional parsing using combinators is a well-established technique, which + has been described by many authors, including Paulson \cite{paulson-ML-91} + and Wadler \cite{Wadler-AFP95}. + The central idea is that a parser is a function of type @{ML_type "'a list -> 'b * 'a list"}, + where @{ML_type "'a"} is a type of \emph{tokens}, and @{ML_type "'b"} is a type for + encoding items that the parser has recognized. When a parser is applied to a + list of tokens whose prefix it can recognize, it returns an encoding of the + prefix as an element of type @{ML_type "'b"}, together with the suffix of the list + containing the remaining tokens. Otherwise, the parser raises an exception + indicating a syntax error. The library for writing functional parsers in + Isabelle can roughly be split up into two parts. The first part consists of a + collection of generic parser combinators that are contained in the structure + @{ML_struct Scan} defined in the file @{ML_file "Pure/General/scan.ML"} in the Isabelle + sources. While these combinators do not make any assumptions about the concrete + structure of the tokens used, the second part of the library consists of combinators + for dealing with specific token types. + The following is an excerpt from the signature of @{ML_struct Scan}: -So far, we have only looked at combinators that construct more complex parsers -from simpler parsers. In order for these combinators to be useful, we also need -some basic parsers. As an example, we consider the following two parsers -defined in @{ML_struct Scan}: - -\begin{table} -@{ML "one: ('a -> bool) -> 'a list -> 'a * 'a list" in Scan} \\ -@{ML "$$ : string -> string list -> string * string list"} -\end{table} + \begin{table} + @{ML "|| : ('a -> 'b) * ('a -> 'b) -> 'a -> 'b"} \\ + @{ML "-- : ('a -> 'b * 'c) * ('c -> 'd * 'e) -> 'a -> ('b * 'd) * 'e"} \\ + @{ML "|-- : ('a -> 'b * 'c) * ('c -> 'd * 'e) -> 'a -> 'd * 'e"} \\ + @{ML "--| : ('a -> 'b * 'c) * ('c -> 'd * 'e) -> 'a -> 'b * 'e"} \\ + @{ML "optional: ('a -> 'b * 'a) -> 'b -> 'a -> 'b * 'a" in Scan} \\ + @{ML "repeat: ('a -> 'b * 'a) -> 'a -> 'b list * 'a" in Scan} \\ + @{ML "repeat1: ('a -> 'b * 'a) -> 'a -> 'b list * 'a" in Scan} \\ + @{ML ">> : ('a -> 'b * 'c) * ('b -> 'd) -> 'a -> 'd * 'c"} \\ + @{ML "!! : ('a * string option -> string) -> ('a -> 'b) -> 'a -> 'b"} + \end{table} -The parser @{ML "one pred" for pred in Scan} parses exactly one token that -satisfies the predicate \texttt{pred}, whereas @{ML "$$ s" for s} only -accepts a token that equals the string \texttt{s}. Note that we can easily -express @{ML "$$ s" for s} using @{ML "one" in Scan}: -@{ML [display] "one (fn s' => s' = s)" for s in Scan} -As an example, let us look at how we can use @{ML "$$"} and @{ML "--"} to parse -the prefix ``\texttt{hello}'' of the character list ``\texttt{hello world}'': -@{ML_response [display] -"($$ \"h\" -- $$ \"e\" -- $$ \"l\" -- $$ \"l\" -- $$ \"o\") -[\"h\", \"e\", \"l\", \"l\", \"o\", \" \", \"w\", \"o\", \"r\", \"l\", \"d\"]" -"(((((\"h\", \"e\"), \"l\"), \"l\"), \"o\"), [\" \", \"w\", \"o\", \"r\", \"l\", \"d\"]) -: ((((string * string) * string) * string) * string) * string list"} -Most of the time, however, we will have to deal with tokens that are not just strings. -The parsers for the theory syntax, as well as the parsers for the argument syntax -of proof methods and attributes use the token type @{ML_type OuterParse.token}, -which is identical to @{ML_type OuterLex.token}. -The parser functions for the theory syntax are contained in the structure -@{ML_struct OuterParse} defined in the file @{ML_file "Pure/Isar/outer_parse.ML"}. -In our parser, we will use the following functions: - -\begin{table} -@{ML "$$$ : string -> token list -> string * token list" in OuterParse} \\ -@{ML "enum1: string -> (token list -> 'a * token list) -> token list -> - 'a list * token list" in OuterParse} \\ -@{ML "prop: token list -> string * token list" in OuterParse} \\ -@{ML "opt_target: token list -> string option * token list" in OuterParse} \\ -@{ML "fixes: token list -> - (Binding.binding * string option * mixfix) list * token list" in OuterParse} \\ -@{ML "for_fixes: token list -> - (Binding.binding * string option * mixfix) list * token list" in OuterParse} \\ -@{ML "!!! : (token list -> 'a) -> token list -> 'a" in OuterParse} -\end{table} + Interestingly, the functions shown above are so generic that they do not + even rely on the input and output of the parser being a list of tokens. + If \texttt{p} succeeds, i.e.\ does not raise an exception, the parser + @{ML "p || q" for p q} returns the result of \texttt{p}, otherwise it returns + the result of \texttt{q}. The parser @{ML "p -- q" for p q} first parses an + item of type @{ML_type "'b"} using \texttt{p}, then passes the remaining tokens + of type @{ML_type "'c"} to \texttt{q}, which parses an item of type @{ML_type "'d"} + and returns the remaining tokens of type @{ML_type "'e"}, which are finally + returned together with a pair of type @{ML_type "'b * 'd"} containing the two + parsed items. The parsers @{ML "p |-- q" for p q} and @{ML "p --| q" for p q} + work in a similar way as the previous one, with the difference that they + discard the item parsed by the first and the second parser, respectively. + If \texttt{p} succeeds, the parser @{ML "optional p x" for p x in Scan} returns the result + of \texttt{p}, otherwise it returns the default value \texttt{x}. The parser + @{ML "repeat p" for p in Scan} applies \texttt{p} as often as it can, returning a possibly + empty list of parsed items. The parser @{ML "repeat1 p" for p in Scan} is similar, + but requires \texttt{p} to succeed at least once. The parser + @{ML "p >> f" for p f} uses \texttt{p} to parse an item of type @{ML_type "'b"}, to which + it applies the function \texttt{f} yielding a value of type @{ML_type "'d"}, which + is returned together with the remaining tokens of type @{ML_type "'c"}. + Finally, @{ML "!!"} is used for transforming exceptions produced by parsers. + If \texttt{p} raises an exception indicating that it cannot parse a given input, + then an enclosing parser such as + @{ML [display] "q -- p || r" for p q r} + will try the alternative parser \texttt{r}. By writing + @{ML [display] "q -- !! err p || r" for err p q r} + instead, one can achieve that a failure of \texttt{p} causes the whole parser to abort. + The @{ML "!!"} operator is similar to the \emph{cut} operator in Prolog, which prevents + the interpreter from backtracking. The \texttt{err} function supplied as an argument + to @{ML "!!"} can be used to produce an error message depending on the current + state of the parser, as well as the optional error message returned by \texttt{p}. + + So far, we have only looked at combinators that construct more complex parsers + from simpler parsers. In order for these combinators to be useful, we also need + some basic parsers. As an example, we consider the following two parsers + defined in @{ML_struct Scan}: + + \begin{table} + @{ML "one: ('a -> bool) -> 'a list -> 'a * 'a list" in Scan} \\ + @{ML "$$ : string -> string list -> string * string list"} + \end{table} + + The parser @{ML "one pred" for pred in Scan} parses exactly one token that + satisfies the predicate \texttt{pred}, whereas @{ML "$$ s" for s} only + accepts a token that equals the string \texttt{s}. Note that we can easily + express @{ML "$$ s" for s} using @{ML "one" in Scan}: + @{ML [display] "one (fn s' => s' = s)" for s in Scan} + As an example, let us look at how we can use @{ML "$$"} and @{ML "--"} to parse + the prefix ``\texttt{hello}'' of the character list ``\texttt{hello world}'': + + @{ML_response [display] + "($$ \"h\" -- $$ \"e\" -- $$ \"l\" -- $$ \"l\" -- $$ \"o\") + [\"h\", \"e\", \"l\", \"l\", \"o\", \" \", \"w\", \"o\", \"r\", \"l\", \"d\"]" + "(((((\"h\", \"e\"), \"l\"), \"l\"), \"o\"), [\" \", \"w\", \"o\", \"r\", \"l\", \"d\"]) + : ((((string * string) * string) * string) * string) * string list"} -The parsers @{ML "$$$" in OuterParse} and @{ML "!!!" in OuterParse} are -defined using the parsers @{ML "one" in Scan} and @{ML "!!"} from -@{ML_struct Scan}. -The parser @{ML "enum1 s p" for s p in OuterParse} parses a non-emtpy list of items -recognized by the parser \texttt{p}, where the items are separated by \texttt{s}. -A proposition can be parsed using the function @{ML prop in OuterParse}. -Essentially, a proposition is just a string or an identifier, but using the -specific parser function @{ML prop in OuterParse} leads to more instructive -error messages, since the parser will complain that a proposition was expected -when something else than a string or identifier is found. -An optional locale target specification of the form \isa{(\isacommand{in}\ $\ldots$)} -can be parsed using @{ML opt_target in OuterParse}. -The lists of names of the predicates and parameters, together with optional -types and syntax, are parsed using the functions @{ML "fixes" in OuterParse} -and @{ML for_fixes in OuterParse}, respectively. -In addition, the following function from @{ML_struct SpecParse} for parsing -an optional theorem name and attribute, followed by a delimiter, will be useful: - -\begin{table} -@{ML "opt_thm_name: - string -> token list -> Attrib.binding * token list" in SpecParse} -\end{table} + Most of the time, however, we will have to deal with tokens that are not just strings. + The parsers for the theory syntax, as well as the parsers for the argument syntax + of proof methods and attributes use the token type @{ML_type OuterParse.token}, + which is identical to @{ML_type OuterLex.token}. + The parser functions for the theory syntax are contained in the structure + @{ML_struct OuterParse} defined in the file @{ML_file "Pure/Isar/outer_parse.ML"}. + In our parser, we will use the following functions: + + \begin{table} + @{ML "$$$ : string -> token list -> string * token list" in OuterParse} \\ + @{ML "enum1: string -> (token list -> 'a * token list) -> token list -> + 'a list * token list" in OuterParse} \\ + @{ML "prop: token list -> string * token list" in OuterParse} \\ + @{ML "opt_target: token list -> string option * token list" in OuterParse} \\ + @{ML "fixes: token list -> + (Binding.binding * string option * mixfix) list * token list" in OuterParse} \\ + @{ML "for_fixes: token list -> + (Binding.binding * string option * mixfix) list * token list" in OuterParse} \\ + @{ML "!!! : (token list -> 'a) -> token list -> 'a" in OuterParse} + \end{table} -We now have all the necessary tools to write the parser for our -\isa{\isacommand{simple{\isacharunderscore}inductive}} command: -@{ML_chunk [display] syntax} -The definition of the parser \verb!ind_decl! closely follows the railroad -diagram shown above. In order to make the code more readable, the structures -@{ML_struct OuterParse} and @{ML_struct OuterKeyword} are abbreviated by -\texttt{P} and \texttt{K}, respectively. Note how the parser combinator -@{ML "!!!" in OuterParse} is used: once the keyword \texttt{where} -has been parsed, a non-empty list of introduction rules must follow. -Had we not used the combinator @{ML "!!!" in OuterParse}, a -\texttt{where} not followed by a list of rules would have caused the parser -to respond with the somewhat misleading error message -\begin{verbatim} + The parsers @{ML "$$$" in OuterParse} and @{ML "!!!" in OuterParse} are + defined using the parsers @{ML "one" in Scan} and @{ML "!!"} from + @{ML_struct Scan}. + The parser @{ML "enum1 s p" for s p in OuterParse} parses a non-emtpy list of items + recognized by the parser \texttt{p}, where the items are separated by \texttt{s}. + A proposition can be parsed using the function @{ML prop in OuterParse}. + Essentially, a proposition is just a string or an identifier, but using the + specific parser function @{ML prop in OuterParse} leads to more instructive + error messages, since the parser will complain that a proposition was expected + when something else than a string or identifier is found. + An optional locale target specification of the form \isa{(\isacommand{in}\ $\ldots$)} + can be parsed using @{ML opt_target in OuterParse}. + The lists of names of the predicates and parameters, together with optional + types and syntax, are parsed using the functions @{ML "fixes" in OuterParse} + and @{ML for_fixes in OuterParse}, respectively. + In addition, the following function from @{ML_struct SpecParse} for parsing + an optional theorem name and attribute, followed by a delimiter, will be useful: + + \begin{table} + @{ML "opt_thm_name: + string -> token list -> Attrib.binding * token list" in SpecParse} + \end{table} + + We now have all the necessary tools to write the parser for our + \isa{\isacommand{simple{\isacharunderscore}inductive}} command: + + @{ML_chunk [display] syntax} + + The definition of the parser \verb!ind_decl! closely follows the railroad + diagram shown above. In order to make the code more readable, the structures + @{ML_struct OuterParse} and @{ML_struct OuterKeyword} are abbreviated by + \texttt{P} and \texttt{K}, respectively. Note how the parser combinator + @{ML "!!!" in OuterParse} is used: once the keyword \texttt{where} + has been parsed, a non-empty list of introduction rules must follow. + Had we not used the combinator @{ML "!!!" in OuterParse}, a + \texttt{where} not followed by a list of rules would have caused the parser + to respond with the somewhat misleading error message + + \begin{verbatim} Outer syntax error: end of input expected, but keyword where was found -\end{verbatim} -rather than with the more instructive message -\begin{verbatim} + \end{verbatim} + + rather than with the more instructive message + + \begin{verbatim} Outer syntax error: proposition expected, but terminator was found -\end{verbatim} -Once all arguments of the command have been parsed, we apply the function -@{ML add_inductive in SimpleInductivePackage}, which yields a local theory -transformer of type @{ML_type "local_theory -> local_theory"}. Commands in -Isabelle/Isar are realized by transition transformers of type -@{ML_type [display] "Toplevel.transition -> Toplevel.transition"} -We can turn a local theory transformer into a transition transformer by using -the function -@{ML [display] "Toplevel.local_theory : string option -> + \end{verbatim} + + Once all arguments of the command have been parsed, we apply the function + @{ML add_inductive in SimpleInductivePackage}, which yields a local theory + transformer of type @{ML_type "local_theory -> local_theory"}. Commands in + Isabelle/Isar are realized by transition transformers of type + @{ML_type [display] "Toplevel.transition -> Toplevel.transition"} + We can turn a local theory transformer into a transition transformer by using + the function + + @{ML [display] "Toplevel.local_theory : string option -> (local_theory -> local_theory) -> Toplevel.transition -> Toplevel.transition"} -which, apart from the local theory transformer, takes an optional name of a locale -to be used as a basis for the local theory. + + which, apart from the local theory transformer, takes an optional name of a locale + to be used as a basis for the local theory. -(FIXME : needs to be adjusted to new parser type) + (FIXME : needs to be adjusted to new parser type) -{\it -The whole parser for our command has type -@{ML_text [display] "OuterLex.token list -> + {\it + The whole parser for our command has type + @{ML_text [display] "OuterLex.token list -> (Toplevel.transition -> Toplevel.transition) * OuterLex.token list"} -which is abbreviated by @{ML_text OuterSyntax.parser_fn}. The new command can be added -to the system via the function -@{ML_text [display] "OuterSyntax.command : + which is abbreviated by @{ML_text OuterSyntax.parser_fn}. The new command can be added + to the system via the function + @{ML_text [display] "OuterSyntax.command : string -> string -> OuterKeyword.T -> OuterSyntax.parser_fn -> unit"} -which imperatively updates the parser table behind the scenes. } + which imperatively updates the parser table behind the scenes. } -In addition to the parser, this -function takes two strings representing the name of the command and a short description, -as well as an element of type @{ML_type OuterKeyword.T} describing which \emph{kind} of -command we intend to add. Since we want to add a command for declaring new concepts, -we choose the kind @{ML "OuterKeyword.thy_decl"}. Other kinds include -@{ML "OuterKeyword.thy_goal"}, which is similar to @{ML thy_decl in OuterKeyword}, -but requires the user to prove a goal before making the declaration, or -@{ML "OuterKeyword.diag"}, which corresponds to a purely diagnostic command that does -not change the context. For example, the @{ML thy_goal in OuterKeyword} kind is used -by the \isa{\isacommand{function}} command \cite{Krauss-IJCAR06}, which requires the user -to prove that a given set of equations is non-overlapping and covers all cases. The kind -of the command should be chosen with care, since selecting the wrong one can cause strange -behaviour of the user interface, such as failure of the undo mechanism. + In addition to the parser, this + function takes two strings representing the name of the command and a short description, + as well as an element of type @{ML_type OuterKeyword.T} describing which \emph{kind} of + command we intend to add. Since we want to add a command for declaring new concepts, + we choose the kind @{ML "OuterKeyword.thy_decl"}. Other kinds include + @{ML "OuterKeyword.thy_goal"}, which is similar to @{ML thy_decl in OuterKeyword}, + but requires the user to prove a goal before making the declaration, or + @{ML "OuterKeyword.diag"}, which corresponds to a purely diagnostic command that does + not change the context. For example, the @{ML thy_goal in OuterKeyword} kind is used + by the \isa{\isacommand{function}} command \cite{Krauss-IJCAR06}, which requires the user + to prove that a given set of equations is non-overlapping and covers all cases. The kind + of the command should be chosen with care, since selecting the wrong one can cause strange + behaviour of the user interface, such as failure of the undo mechanism. *} (*<*)