CookBook/Parsing.thy
changeset 47 4daf913fdbe1
parent 44 dee4b3e66dfe
child 48 609f9ef73494
equal deleted inserted replaced
46:81e2d73f7191 47:4daf913fdbe1
    13   on, belong to the outer syntax, whereas items inside double quotation marks, such 
    13   on, belong to the outer syntax, whereas items inside double quotation marks, such 
    14   as terms, types and so on, belong to the inner syntax. For parsing inner syntax, 
    14   as terms, types and so on, belong to the inner syntax. For parsing inner syntax, 
    15   Isabelle uses a rather general and sophisticated algorithm due to Earley, which 
    15   Isabelle uses a rather general and sophisticated algorithm due to Earley, which 
    16   is driven by priority grammars. Parsers for outer syntax are built up by functional
    16   is driven by priority grammars. Parsers for outer syntax are built up by functional
    17   parsing combinators. These combinators are a well-established technique for parsing, 
    17   parsing combinators. These combinators are a well-established technique for parsing, 
    18   which has, for example, been described in Paulson's classic book \cite{paulson-ml2}.
    18   which has, for example, been described in Paulson's classic ML-book \cite{paulson-ml2}.
    19   Isabelle developers are usually concerned with writing these outer syntax parsers, 
    19   Isabelle developers are usually concerned with writing these outer syntax parsers, 
    20   either for new definitional packages or for calling tactics with specific arguments. 
    20   either for new definitional packages or for calling tactics with specific arguments. 
    21 
    21 
    22   \begin{readmore}
    22   \begin{readmore}
    23   The library 
    23   The library 
   108   for parsers, except that they discard the item parsed by the first (respectively second)
   108   for parsers, except that they discard the item parsed by the first (respectively second)
   109   parser. For example
   109   parser. For example
   110   
   110   
   111 @{ML_response [display]
   111 @{ML_response [display]
   112 "let 
   112 "let 
   113   val just_h = ($$ \"h\") |-- ($$ \"e\") 
   113   val just_e = ($$ \"h\") |-- ($$ \"e\") 
   114   val just_e = ($$ \"h\") --| ($$ \"e\") 
   114   val just_h = ($$ \"h\") --| ($$ \"e\") 
   115   val input = (explode \"hello\")  
   115   val input = (explode \"hello\")  
   116 in 
   116 in 
   117   (just_h input, just_e input)
   117   (just_e input, just_h input)
   118 end"
   118 end"
   119   "((\"e\", [\"l\", \"l\", \"o\"]),(\"h\", [\"l\", \"l\", \"o\"]))"}
   119   "((\"e\", [\"l\", \"l\", \"o\"]),(\"h\", [\"l\", \"l\", \"o\"]))"}
   120 
   120 
   121   The parser @{ML_open "Scan.optional p x" for p x} returns the result of the parser 
   121   The parser @{ML_open "Scan.optional p x" for p x} returns the result of the parser 
   122   @{ML_text "p"}, if it succeeds; otherwise it returns 
   122   @{ML_text "p"}, if it succeeds; otherwise it returns 
   180   to generate the correct error message for p-followed-by-q, then
   180   to generate the correct error message for p-followed-by-q, then
   181   we have to write, for example
   181   we have to write, for example
   182 *}
   182 *}
   183 
   183 
   184 ML {* 
   184 ML {* 
   185   fun p_followed_by_q p q r =
   185 fun p_followed_by_q p q r =
   186   let 
   186   let 
   187      val err = (fn _ => p ^ " is not followed by " ^ q)
   187      val err = (fn _ => p ^ " is not followed by " ^ q)
   188   in
   188   in
   189     (($$ p) -- (!! err ($$ q))) || (($$ r) -- ($$ r))
   189     (($$ p) -- (!! err ($$ q))) || (($$ r) -- ($$ r))
   190   end
   190   end
   256   The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}.
   256   The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}.
   257   \end{readmore}
   257   \end{readmore}
   258 
   258 
   259   The structure @{ML_struct OuterLex} defines several kinds of token (for example 
   259   The structure @{ML_struct OuterLex} defines several kinds of token (for example 
   260   @{ML "OuterLex.Ident"} for identifiers, @{ML "OuterLex.Keyword"} for keywords and
   260   @{ML "OuterLex.Ident"} for identifiers, @{ML "OuterLex.Keyword"} for keywords and
   261   @{ML "OuterLex.Command"} for commands).
   261   @{ML "OuterLex.Command"} for commands). Some parsers take into account the 
       
   262   kind of token.
       
   263   
   262   We can generate a token list using the function @{ML "OuterSyntax.scan"}, which we give
   264   We can generate a token list using the function @{ML "OuterSyntax.scan"}, which we give
   263   below @{ML "Position.none"} as argument since, at the moment, we are not interested in
   265   below @{ML "Position.none"} as argument since, at the moment, we are not interested in
   264   generating precise error messages. The following\footnote{There is something funny 
   266   generating precise error messages. The following\footnote{There is something funny 
   265   going on with the pretty printing of the result token list.}
   267   going on with the pretty printing of the result token list.}
   266 
   268 
   267 @{ML_response [display] "OuterSyntax.scan Position.none \"hello world\"" 
   269 @{ML_response [display] "OuterSyntax.scan Position.none \"hello world\"" 
   268 "[OuterLex.Token (\<dots>,(OuterLex.Ident, \"hello\"),\<dots>), 
   270 "[OuterLex.Token (\<dots>,(OuterLex.Ident, \"hello\"),\<dots>), 
   269  OuterLex.Token (\<dots>,(OuterLex.Space, \" \"),\<dots>), 
   271  OuterLex.Token (\<dots>,(OuterLex.Space, \" \"),\<dots>), 
   270  OuterLex.Token (\<dots>,(OuterLex.Ident, \"world\"),\<dots>)]"}
   272  OuterLex.Token (\<dots>,(OuterLex.Ident, \"world\"),\<dots>)]"}
   271 
   273 
   272   produces three token where the first and the last are identifiers, since 
   274   produces three tokens where the first and the last are identifiers, since 
   273   @{ML_text [quotes] "hello"} and @{ML_text [quotes] "world"} do not match 
   275   @{ML_text [quotes] "hello"} and @{ML_text [quotes] "world"} do not match 
   274   any other category. The second indicates a space. If we parse
   276   any other category. The second indicates a space. If we parse
   275 
   277 
   276 @{ML_response [display] "OuterSyntax.scan Position.none \"inductive|for\"" 
   278 @{ML_response [display] "OuterSyntax.scan Position.none \"inductive|for\"" 
   277 "[OuterLex.Token (\<dots>,(OuterLex.Command, \"inductive\"),\<dots>), 
   279 "[OuterLex.Token (\<dots>,(OuterLex.Command, \"inductive\"),\<dots>), 
   278  OuterLex.Token (\<dots>,(OuterLex.Keyword, \"|\"),\<dots>), 
   280  OuterLex.Token (\<dots>,(OuterLex.Keyword, \"|\"),\<dots>), 
   279  OuterLex.Token (\<dots>,(OuterLex.Keyword, \"for\"),\<dots>)]"}
   281  OuterLex.Token (\<dots>,(OuterLex.Keyword, \"for\"),\<dots>)]"}
   280 
   282 
   281   we obtain a list of command/keyword token.
   283   we obtain a list of command and keyword tokens.
       
   284   If you want to see which keywords and commands are currently known, use
       
   285   the following (you might have to adjust the @{ML print_depth} in order to
       
   286   see the complete list):
       
   287 
       
   288 @{ML_response_fake [display] 
       
   289 "let 
       
   290   val (keywords, commands) = OuterKeyword.get_lexicons ()
       
   291 in 
       
   292   (Scan.dest_lexicon commands, Scan.dest_lexicon keywords)
       
   293 end" 
       
   294 "([\"}\",\"{\",\<dots>],[\"\<rightleftharpoons>\",\"\<leftharpoondown>\",\<dots>])"}
   282 
   295 
   283   Now the parser @{ML "OuterParse.$$$"} parses a single keyword. For example
   296   Now the parser @{ML "OuterParse.$$$"} parses a single keyword. For example
   284  
   297  
   285 @{ML_response [display]
   298 @{ML_response [display]
   286 "let 
   299 "let 
   312 in 
   325 in 
   313   (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input
   326   (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input
   314 end" 
   327 end" 
   315 "([\"in\",\"in\",\"in\"],[\<dots>])"}
   328 "([\"in\",\"in\",\"in\"],[\<dots>])"}
   316 
   329 
   317  @{ML_open "OuterParse.enum1"} works similarly, except that the list must be non-empty.
   330   Note that we had to add a @{ML_text [quotes] "\\n"} at the end of the parsed
       
   331   string, otherwise the parser would have consumed all tokens and then failed with
       
   332   the exception @{ML_text "FAIL"}. @{ML "OuterParse.enum1"} works similarly, 
       
   333   except that the parsed list must be non-empty.
   318 
   334 
   319 *}
   335 *}
   320 
   336 
   321 text {* FIXME explain @{ML "OuterParse.!!!"} *}
   337 text {* FIXME explain @{ML "OuterParse.!!!"} *}
   322 
   338 
   329   OuterParse.prop input
   345   OuterParse.prop input
   330 end 
   346 end 
   331 
   347 
   332 *}
   348 *}
   333 
   349 
   334 text {* FIXME funny output for a proposition *}
   350 text {* (FIXME funny output for a proposition) *}
   335 
   351 
   336 
   352 
   337 
   353 
   338 chapter {* Parsing *}
   354 chapter {* Parsing *}
   339 
   355 
   560   \end{verbatim}
   576   \end{verbatim}
   561   but here are some runnable examples for viewing tokens: 
   577   but here are some runnable examples for viewing tokens: 
   562 
   578 
   563 *}
   579 *}
   564 
   580 
   565 text {*
   581 
   566   FIXME
   582 
   567 
   583 
   568   @{text "
   584 ML {*
   569   begin{verbatim}
   585   val toks = OuterSyntax.scan Position.none
   570   type token = T.token ;
   586    "theory,imports;begin x.y.z apply ?v1 ?'a 'a -- || 44 simp (* xx *) { * fff * }" ;
   571   val toks : token list = OuterSyntax.scan ``theory,imports;begin x.y.z apply ?v1 ?'a 'a -- || 44 simp (* xx *) { * fff * }'' ;
   587 *}
       
   588 
       
   589 ML {*
   572   print_depth 20 ;
   590   print_depth 20 ;
   573   List.map T.text_of toks ;
   591 *}
   574   val proper_toks = List.filter T.is_proper toks ;
   592 
   575   List.map T.kind_of proper_toks ;
   593 ML {*
   576   List.map T.unparse proper_toks ;
   594   map OuterLex.text_of toks ;
   577   List.map T.val_of proper_toks ;
   595 *}
   578   end{verbatim}"}
   596 
   579 
   597 ML {*
       
   598   val proper_toks = filter OuterLex.is_proper toks ;
       
   599 *}  
       
   600 
       
   601 ML {*
       
   602   map OuterLex.kind_of proper_toks 
       
   603 *}
       
   604 
       
   605 ML {*
       
   606   map OuterLex.unparse proper_toks ;
       
   607 *}
       
   608 
       
   609 ML {*
       
   610   OuterLex.stopper
   580 *}
   611 *}
   581 
   612 
   582 text {*
   613 text {*
   583 
   614 
   584   The function \texttt{is\_proper : token -> bool} identifies tokens which are
   615   The function \texttt{is\_proper : token -> bool} identifies tokens which are