ProgTutorial/Parsing.thy
changeset 230 8def50824320
parent 229 abc7f90188af
child 236 7b6d81ff9d9a
equal deleted inserted replaced
229:abc7f90188af 230:8def50824320
     1 theory Parsing
     1 theory Parsing
     2 imports Base "Package/Simple_Inductive_Package"
     2 imports Base "Package/Simple_Inductive_Package"
     3 begin
     3 begin
     4 
       
     5 
     4 
     6 chapter {* Parsing *}
     5 chapter {* Parsing *}
     7 
     6 
     8 text {*
     7 text {*
     9 
     8 
   357 ML{*type 'a parser = OuterLex.token list -> 'a * OuterLex.token list*}
   356 ML{*type 'a parser = OuterLex.token list -> 'a * OuterLex.token list*}
   358 
   357 
   359 text {*
   358 text {*
   360   The reason for using token parsers is that theory syntax, as well as the
   359   The reason for using token parsers is that theory syntax, as well as the
   361   parsers for the arguments of proof methods, use the type @{ML_type
   360   parsers for the arguments of proof methods, use the type @{ML_type
   362   OuterLex.token} (which is identical to the type @{ML_type
   361   OuterLex.token}.
   363   OuterParse.token}).  However, there are also handy parsers for
       
   364   ML-expressions and ML-files.
       
   365 
   362 
   366   \begin{readmore}
   363   \begin{readmore}
   367   The parser functions for the theory syntax are contained in the structure
   364   The parser functions for the theory syntax are contained in the structure
   368   @{ML_struct OuterParse} defined in the file @{ML_file  "Pure/Isar/outer_parse.ML"}.
   365   @{ML_struct OuterParse} defined in the file @{ML_file  "Pure/Isar/outer_parse.ML"}.
   369   The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}.
   366   The definition for tokens is in the file @{ML_file "Pure/Isar/outer_lex.ML"}.
   370   \end{readmore}
   367   \end{readmore}
   371 
   368 
   372   The structure @{ML_struct OuterLex} defines several kinds of tokens (for example 
   369   The structure @{ML_struct OuterLex} defines several kinds of tokens (for
   373   @{ML "Ident" in OuterLex} for identifiers, @{ML "Keyword" in OuterLex} for keywords and
   370   example @{ML "Ident" in OuterLex} for identifiers, @{ML "Keyword" in
   374   @{ML "Command" in OuterLex} for commands). Some token parsers take into account the 
   371   OuterLex} for keywords and @{ML "Command" in OuterLex} for commands). Some
   375   kind of tokens.
   372   token parsers take into account the kind of tokens. The first example shows
   376 *}  
   373   how to generate a token list out of a string using the function @{ML
   377 
   374   "OuterSyntax.scan"}. It is given the argument @{ML "Position.none"} since,
   378 text {*
   375   at the moment, we are not interested in generating precise error
   379   The first example shows how to generate a token list out of a string using
   376   messages. The following code\footnote{Note that because of a possible bug in
   380   the function @{ML "OuterSyntax.scan"}. It is given the argument @{ML "Position.none"}
   377   the PolyML runtime system, the result is printed as @{text [quotes] "?"},
   381   since, at the moment, we are not interested in generating
   378   instead of the tokens.}
   382   precise error messages. The following code
   379 
   383 
   380 
   384 @{ML_response_fake [display,gray] "OuterSyntax.scan Position.none \"hello world\"" 
   381 @{ML_response_fake [display,gray] "OuterSyntax.scan Position.none \"hello world\"" 
   385 "[Token (\<dots>,(Ident, \"hello\"),\<dots>), 
   382 "[Token (\<dots>,(Ident, \"hello\"),\<dots>), 
   386  Token (\<dots>,(Space, \" \"),\<dots>), 
   383  Token (\<dots>,(Space, \" \"),\<dots>), 
   387  Token (\<dots>,(Ident, \"world\"),\<dots>)]"}
   384  Token (\<dots>,(Ident, \"world\"),\<dots>)]"}
   388 
   385 
   389   produces three tokens where the first and the last are identifiers, since
   386   produces three tokens where the first and the last are identifiers, since
   390   @{text [quotes] "hello"} and @{text [quotes] "world"} do not match any
   387   @{text [quotes] "hello"} and @{text [quotes] "world"} do not match any
   391   other syntactic category.\footnote{Note that because of a possible bug in
   388   other syntactic category. The second indicates a space.
   392   the PolyML runtime system, the result is printed as @{text [quotes] "?"}, instead of
   389 
   393   the tokens.} The second indicates a space.
   390   We can easily change what is recognised as a keyword with 
       
   391   @{ML OuterKeyword.keyword}. For example calling this function 
       
   392 *}
       
   393 
       
   394 ML{*val _ = OuterKeyword.keyword "hello"*}
       
   395 
       
   396 text {*
       
   397   then lexing @{text [quotes] "hello world"} will produce
       
   398 
       
   399   @{ML_response_fake [display,gray] "OuterSyntax.scan Position.none \"hello world\"" 
       
   400 "[Token (\<dots>,(Keyword, \"hello\"),\<dots>), 
       
   401  Token (\<dots>,(Space, \" \"),\<dots>), 
       
   402  Token (\<dots>,(Ident, \"world\"),\<dots>)]"}
   394 
   403 
   395   Many parsing functions later on will require spaces, comments and the like
   404   Many parsing functions later on will require spaces, comments and the like
   396   to have already been filtered out.  So from now on we are going to use the 
   405   to have already been filtered out.  So from now on we are going to use the 
   397   functions @{ML filter} and @{ML OuterLex.is_proper} to do this. For example:
   406   functions @{ML filter} and @{ML OuterLex.is_proper} to do this. For example:
   398 
   407 
   403    filter OuterLex.is_proper input
   412    filter OuterLex.is_proper input
   404 end" 
   413 end" 
   405 "[Token (\<dots>,(Ident, \"hello\"), \<dots>), Token (\<dots>,(Ident, \"world\"), \<dots>)]"}
   414 "[Token (\<dots>,(Ident, \"hello\"), \<dots>), Token (\<dots>,(Ident, \"world\"), \<dots>)]"}
   406 
   415 
   407   For convenience we define the function:
   416   For convenience we define the function:
   408 
       
   409 *}
   417 *}
   410 
   418 
   411 ML{*fun filtered_input str = 
   419 ML{*fun filtered_input str = 
   412   filter OuterLex.is_proper (OuterSyntax.scan Position.none str) *}
   420   filter OuterLex.is_proper (OuterSyntax.scan Position.none str) *}
   413 
   421 
   414 text {*
   422 text {* 
   415 
       
   416   If you now parse
   423   If you now parse
   417 
   424 
   418 @{ML_response_fake [display,gray] 
   425 @{ML_response_fake [display,gray] 
   419 "filtered_input \"inductive | for\"" 
   426 "filtered_input \"inductive | for\"" 
   420 "[Token (\<dots>,(Command, \"inductive\"),\<dots>), 
   427 "[Token (\<dots>,(Command, \"inductive\"),\<dots>), 
   443 in 
   450 in 
   444   (OuterParse.$$$ \"where\" input1, OuterParse.$$$ \"|\" input2)
   451   (OuterParse.$$$ \"where\" input1, OuterParse.$$$ \"|\" input2)
   445 end"
   452 end"
   446 "((\"where\",\<dots>), (\"|\",\<dots>))"}
   453 "((\"where\",\<dots>), (\"|\",\<dots>))"}
   447 
   454 
       
   455   Any non-keyword string can be parsed with the function @{ML OuterParse.reserved}.
       
   456   For example:
       
   457 
       
   458   @{ML_response [display,gray]
       
   459 "let 
       
   460   val p = OuterParse.reserved \"bar\"
       
   461   val input = filtered_input \"bar\"
       
   462 in
       
   463   p input
       
   464 end"
       
   465   "(\"bar\",[])"}
       
   466 
   448   Like before, you can sequentially connect parsers with @{ML "--"}. For example: 
   467   Like before, you can sequentially connect parsers with @{ML "--"}. For example: 
   449 
   468 
   450 @{ML_response [display,gray]
   469 @{ML_response [display,gray]
   451 "let 
   470 "let 
   452   val input = filtered_input \"| in\"
   471   val input = filtered_input \"| in\"
   483          (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input
   502          (OuterParse.enum \"|\" (OuterParse.$$$ \"in\")) input
   484 end" 
   503 end" 
   485 "([\"in\", \"in\", \"in\"], [])"}
   504 "([\"in\", \"in\", \"in\"], [])"}
   486 
   505 
   487   The following function will help to run examples.
   506   The following function will help to run examples.
   488 
       
   489 *}
   507 *}
   490 
   508 
   491 ML{*fun parse p input = Scan.finite OuterLex.stopper (Scan.error p) input *}
   509 ML{*fun parse p input = Scan.finite OuterLex.stopper (Scan.error p) input *}
   492 
   510 
   493 text {*
   511 text {*
   552   \begin{readmore}
   570   \begin{readmore}
   553   The functions related to positions are implemented in the file
   571   The functions related to positions are implemented in the file
   554   @{ML_file "Pure/General/position.ML"}.
   572   @{ML_file "Pure/General/position.ML"}.
   555   \end{readmore}
   573   \end{readmore}
   556 
   574 
       
   575 *}
       
   576 
       
   577 text {*
       
   578   (FIXME: there are also handy parsers for ML-expressions and ML-files)
   557 *}
   579 *}
   558 
   580 
   559 section {* Context Parser (TBD) *}
   581 section {* Context Parser (TBD) *}
   560 
   582 
   561 text {*
   583 text {*
   899 
   921 
   900   @{text [display] "$ isabelle emacs -k foobar a_theory_file"}
   922   @{text [display] "$ isabelle emacs -k foobar a_theory_file"}
   901 
   923 
   902   If you now build a theory on top of @{text "Command.thy"}, 
   924   If you now build a theory on top of @{text "Command.thy"}, 
   903   then the command \isacommand{foobar} can be used. 
   925   then the command \isacommand{foobar} can be used. 
   904   Similarly with any other new command. 
   926   Similarly with any other new command, and also any new keyword that is 
   905 
   927   introduced with
   906 
   928 *}
   907   At the moment \isacommand{foobar} is not very useful. Let us refine it a bit 
   929 
       
   930 ML{*val _ = OuterKeyword.keyword "blink" *}
       
   931 
       
   932 text {*
       
   933   At the moment the command \isacommand{foobar} is not very useful. Let us refine 
       
   934   it a bit 
   908   next by letting it take a proposition as argument and printing this proposition 
   935   next by letting it take a proposition as argument and printing this proposition 
   909   inside the tracing buffer. 
   936   inside the tracing buffer. 
   910 
   937 
   911   The crucial part of a command is the function that determines the behaviour
   938   The crucial part of a command is the function that determines the behaviour
   912   of the command. In the code above we used a ``do-nothing''-function, which
   939   of the command. In the code above we used a ``do-nothing''-function, which
  1000   \isacommand{apply}@{text "(rule conjI)"}\\
  1027   \isacommand{apply}@{text "(rule conjI)"}\\
  1001   \isacommand{apply}@{text "(rule TrueI)+"}\\
  1028   \isacommand{apply}@{text "(rule TrueI)+"}\\
  1002   \isacommand{done}
  1029   \isacommand{done}
  1003   \end{isabelle}
  1030   \end{isabelle}
  1004 
  1031 
  1005   (FIXME read a name and show how to store theorems)
  1032   (FIXME: read a name and show how to store theorems)
  1006 *}
  1033 *}
  1007 
  1034 
  1008 section {* Methods (TBD) *}
  1035 section {* Methods (TBD) *}
  1009 
  1036 
  1010 text {*
  1037 text {*