isabelle-cookbook: comparison CookBook/Parsing.thy

equal deleted inserted replaced

-:b11653b11bd3
+:cd612b489504
 theory Parsing
 imports Base
 begin
 chapter {* Parsing *}
 text {*
 as terms, types and so on, belong to the inner syntax. For parsing inner syntax,
 Isabelle uses a rather general and sophisticated algorithm due to Earley, which
 is driven by priority grammars. Parsers for outer syntax are built up by functional
 parsing combinators. These combinators are a well-established technique for parsing,
 which has, for example, been described in Paulson's classic book \cite{paulson-ml2}.
+Isabelle developers are usually concerned with writing these outer syntax parsers,
-Isabelle developers are usually concerned with writing parsers for outer
+either for new definitional packages or for calling tactics with specific arguments.
-syntax, either for new definitional packages or for calling tactics. The library
-for writing such parsers in can roughly be split up into two parts.
+\begin{readmore}
+The library
+for writing parser combinators can be split up, roughly, into two parts.
 The first part consists of a collection of generic parser combinators defined
 in the structure @{ML_struct Scan} in the file
 @{ML_file "Pure/General/scan.ML"}. The second part of the library consists of
 combinators for dealing with specific token types, which are defined in the
 structure @{ML_struct OuterParse} in the file
 @{ML_file "Pure/Isar/outer_parse.ML"}.
+\end{readmore}
 *}
 section {* Building Up Generic Parsers *}
 @{ML_response_fake [display] "($$ \"x\") (explode \"world\")"
 "Exception FAIL raised"}
 will raise the exception @{ML_text "FAIL"}.
 There are three exceptions used in the parsing combinators:
+(FIXME: do the exception need to be explained, because the user cannot use them from ``outside''?)
 \begin{itemize}
 \item @{ML_text "FAIL"} is used to indicate that alternative routes of parsing
 might be explored.
 \item @{ML_text "MORE"} indicates that there is not enough input for the parser. For example
 Note how the result of consumed strings builds up on the left as nested pairs.
 Parsers that explore
 alternatives can be constructed using the function @{ML "(op ||)"}. For example, the
-parser @{ML_open "(p || q)" for p q} returns the result of @{ML_text "p"}, if it succeeds,
+parser @{ML_open "(p || q)" for p q} returns the result of @{ML_text "p"}, in case it succeeds,
 otherwise it returns the result of @{ML_text "q"}. For example
 @{ML_response [display]
 "let
 val hw = ($$ \"h\") || ($$ \"w\")
 one might write
 @{ML_open [display] "(p -- q) || r" for p q r}
 However, this parser is problematic for producing an appropriate error message, in case
-the parsing of @{ML_open "(p -- q)" for p q} fails. Because one loses the information
+the parsing of @{ML_open "(p -- q)" for p q} fails. Because in that case one loses with the parser
-that @{ML_text p} should be followed by @{ML_text q}. To see this consider the case that @{ML_text p}
+above the information
-is present in the input, but not @{ML_text q}. So @{ML_open "(p -- q)" for p q} will fail and the
+that @{ML_text p} should be followed by @{ML_text q}. To see this consider the case in
+which @{ML_text p}
+is present in the input, but not @{ML_text q}. That means @{ML_open "(p -- q)" for p q} will fail
+and the
 alternative parser @{ML_text r} will be tried. However in many circumstanes this will be the wrong
-parser for the input and therefore will fail. The error message is then caused by the
+parser for the input ``p-followed-by-q'' and therefore will fail. The error message is then caused by the
-failure of @{ML_text r}, not by the absense of @{ML_text p} in the input. These situations
+failure of @{ML_text r}, not by the absense of @{ML_text q} in the input. This kind of situation
-can be avoided using the funtion @{ML "(op !!)"}. This function aborts the whole process of
+can be avoided by using the funtion @{ML "(op !!)"}. This function aborts the whole process of
 parsing in case of failure and invokes an error message. For example if we invoke the parser
 @{ML [display] "(!! (fn _ => \"foo\") ($$ \"h\"))"}
-on @{ML_text "hello"}, the parsing succeeds
+on @{ML_text [quotes] "hello"}, the parsing succeeds
 @{ML_response [display]
 "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"hello\")"
 "(\"h\", [\"e\", \"l\", \"l\", \"o\"])"}
-but if we invoke it on @{ML_text "world"}
+but if we invoke it on @{ML_text [quotes] "world"}
 @{ML_response_fake [display] "(!! (fn _ => \"foo\") ($$ \"h\")) (explode \"world\")"
 "Exception ABORT raised"}
 the parsing aborts and the error message @{ML_text "foo"} is printed out. In order to
 @{ML_response_fake [display] "Scan.error ((!! (fn _ => \"foo\") ($$ \"h\")))"
 "Exception Error \"foo\" raised"}
 This ``prefixing'' is usually done by wrappers such as @{ML "OuterSyntax.command"}
-(FIXME: give reference to late).
+(FIXME: give reference to later place).
 Returning to our example of parsing @{ML_open "(p -- q) || r" for p q r}. If we want
-to generate the correct error message for @{ML_text q} not following @{ML_text p}, then
+to generate the correct error message for p-followed-by-q, then
 we have to write
 *}
 ML {*
 fun p_followed_by_q p q r =
 let
 val err = (fn _ => p ^ " is not followed by " ^ q)
 in
 (($$ p) -- (!! err ($$ q))) || (($$ r) -- ($$ r))
 text {*
 Most of the time, however, Isabelle developers have to deal with parsing tokens, not strings.
 This is because the parsers for the theory syntax, as well as the parsers for the
 argument syntax of proof methods and attributes, use the type
 @{ML_type OuterParse.token} (which is identical to the type @{ML_type OuterLex.token}).
+\begin{readmore}
 The parser functions for the theory syntax are contained in the structure
-@{ML_struct OuterParse} defined in the file\linebreak @{ML_file  "Pure/Isar/outer_parse.ML"}.
+@{ML_struct OuterParse} defined in the file @{ML_file  "Pure/Isar/outer_parse.ML"}.
+The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}.
+\end{readmore}
+*}
+ML {* OuterLex.mk_text "hello" *}
+ML {* print_depth 50 *}
+ML {*
+let open OuterLex in
+OuterSyntax.scan Position.none "for"
+end;
 *}
 ML {* map OuterLex.mk_text (explode "hello") *}
+ML {*
+fun my_scan lex pos str =
+Source.of_string str
+|> Symbol.source {do_recover = false}
+|> OuterLex.source {do_recover = SOME false}
+(fn () => pairself (Scan.make_lexicon o map Symbol.explode) lex) pos
+|> Source.exhaust;
+*}
+ML {*
+let val toks = my_scan (["hello"], []) Position.none "hello"
+in (OuterParse.$$$ "hello") toks end
+*}
 text {*
 @{ML_response_fake [display] "map OuterLex.mk_text (explode \"hello\")"
 "[Token (\<dots>), Token (\<dots>), Token (\<dots>), Token (\<dots>), Token (\<dots>)]"}

changeset 42	cd612b489504
parent 41	b11653b11bd3
child 43	02f76f1b6e7b