isabelle-cookbook: comparison CookBook/Parsing.thy

equal deleted inserted replaced

-:81e2d73f7191
+:4daf913fdbe1
 on, belong to the outer syntax, whereas items inside double quotation marks, such
 as terms, types and so on, belong to the inner syntax. For parsing inner syntax,
 Isabelle uses a rather general and sophisticated algorithm due to Earley, which
 is driven by priority grammars. Parsers for outer syntax are built up by functional
 parsing combinators. These combinators are a well-established technique for parsing,
-which has, for example, been described in Paulson's classic book \cite{paulson-ml2}.
+which has, for example, been described in Paulson's classic ML-book \cite{paulson-ml2}.
 Isabelle developers are usually concerned with writing these outer syntax parsers,
 either for new definitional packages or for calling tactics with specific arguments.
 \begin{readmore}
 The library
 for parsers, except that they discard the item parsed by the first (respectively second)
 parser. For example
 @{ML_response [display]
 "let
-val just_h = ($$ \"h\") |-- ($$ \"e\")
+val just_e = ($$ \"h\") |-- ($$ \"e\")
-val just_e = ($$ \"h\") --| ($$ \"e\")
+val just_h = ($$ \"h\") --| ($$ \"e\")
 val input = (explode \"hello\")
 in
-(just_h input, just_e input)
+(just_e input, just_h input)
 end"
 "((\"e\", [\"l\", \"l\", \"o\"]),(\"h\", [\"l\", \"l\", \"o\"]))"}
 The parser @{ML_open "Scan.optional p x" for p x} returns the result of the parser
 @{ML_text "p"}, if it succeeds; otherwise it returns
 to generate the correct error message for p-followed-by-q, then
 we have to write, for example
 *}
 ML {*
 fun p_followed_by_q p q r =
 let
 val err = (fn _ => p ^ " is not followed by " ^ q)
 in
 (($$ p) -- (!! err ($$ q))) || (($$ r) -- ($$ r))
 end
 The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}.
 \end{readmore}
 The structure @{ML_struct OuterLex} defines several kinds of token (for example
 @{ML "OuterLex.Ident"} for identifiers, @{ML "OuterLex.Keyword"} for keywords and
-@{ML "OuterLex.Command"} for commands).
+@{ML "OuterLex.Command"} for commands). Some parsers take into account the
+kind of token.
 We can generate a token list using the function @{ML "OuterSyntax.scan"}, which we give
 below @{ML "Position.none"} as argument since, at the moment, we are not interested in
 generating precise error messages. The following\footnote{There is something funny
 going on with the pretty printing of the result token list.}
 @{ML_response [display] "OuterSyntax.scan Position.none \"hello world\""
 "[OuterLex.Token (\<dots>,(OuterLex.Ident, \"hello\"),\<dots>),
 OuterLex.Token (\<dots>,(OuterLex.Space, \" \"),\<dots>),
 OuterLex.Token (\<dots>,(OuterLex.Ident, \"world\"),\<dots>)]"}
-produces three token where the first and the last are identifiers, since
+produces three tokens where the first and the last are identifiers, since
 @{ML_text [quotes] "hello"} and @{ML_text [quotes] "world"} do not match
 any other category. The second indicates a space. If we parse
 @{ML_response [display] "OuterSyntax.scan Position.none \"inductive|for\""
 "[OuterLex.Token (\<dots>,(OuterLex.Command, \"inductive\"),\<dots>),
 OuterLex.Token (\<dots>,(OuterLex.Keyword, \"|\"),\<dots>),
 OuterLex.Token (\<dots>,(OuterLex.Keyword, \"for\"),\<dots>)]"}
-we obtain a list of command/keyword token.
+we obtain a list of command and keyword tokens.
+If you want to see which keywords and commands are currently known, use
+the following (you might have to adjust the @{ML print_depth} in order to
+see the complete list):
+@{ML_response_fake [display]
+"let
+val (keywords, commands) = OuterKeyword.get_lexicons ()
+in
+(Scan.dest_lexicon commands, Scan.dest_lexicon keywords)
+end"
+"([\"}\",\"{\",\<dots>],[\"\<rightleftharpoons>\",\"\<leftharpoondown>\",\<dots>])"}
 Now the parser @{ML "OuterParse.$$$"} parses a single keyword. For example
 @{ML_response [display]
 "let
 in
 (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input
 end"
 "([\"in\",\"in\",\"in\"],[\<dots>])"}
-@{ML_open "OuterParse.enum1"} works similarly, except that the list must be non-empty.
+Note that we had to add a @{ML_text [quotes] "\\n"} at the end of the parsed
+string, otherwise the parser would have consumed all tokens and then failed with
+the exception @{ML_text "FAIL"}. @{ML "OuterParse.enum1"} works similarly,
+except that the parsed list must be non-empty.
 *}
 text {* FIXME explain @{ML "OuterParse.!!!"} *}
 OuterParse.prop input
 end
 *}
-text {* FIXME funny output for a proposition *}
+text {* (FIXME funny output for a proposition) *}
 chapter {* Parsing *}
 \end{verbatim}
 but here are some runnable examples for viewing tokens:
 *}
-text {*
-FIXME
-@{text "
+ML {*
-begin{verbatim}
+val toks = OuterSyntax.scan Position.none
-type token = T.token ;
+"theory,imports;begin x.y.z apply ?v1 ?'a 'a -- || 44 simp (* xx *) { * fff * }" ;
-val toks : token list = OuterSyntax.scan ``theory,imports;begin x.y.z apply ?v1 ?'a 'a -- || 44 simp (* xx *) { * fff * }'' ;
+*}
+ML {*
 print_depth 20 ;
-List.map T.text_of toks ;
+*}
-val proper_toks = List.filter T.is_proper toks ;
-List.map T.kind_of proper_toks ;
+ML {*
-List.map T.unparse proper_toks ;
+map OuterLex.text_of toks ;
-List.map T.val_of proper_toks ;
+*}
-end{verbatim}"}
+ML {*
+val proper_toks = filter OuterLex.is_proper toks ;
+*}
+ML {*
+map OuterLex.kind_of proper_toks
+*}
+ML {*
+map OuterLex.unparse proper_toks ;
+*}
+ML {*
+OuterLex.stopper
 *}
 text {*
 The function \texttt{is\_proper : token -> bool} identifies tokens which are

changeset 47	4daf913fdbe1
parent 44	dee4b3e66dfe
child 48	609f9ef73494