ProgTutorial/Parsing.thy
changeset 220 fbcb89d84ba6
parent 211 d5accbc67e1b
child 221 a9eb69749c93
equal deleted inserted replaced
214:7e04dc2368b0 220:fbcb89d84ba6
    81 in
    81 in
    82     (hw input1, hw input2)
    82     (hw input1, hw input2)
    83 end"
    83 end"
    84     "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
    84     "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"}
    85 
    85 
    86   Two parser can be connected in sequence by using the function @{ML "--"}. 
    86   Two parsers can be connected in sequence by using the function @{ML "--"}. 
    87   For example parsing @{text "h"}, @{text "e"} and @{text "l"} in this 
    87   For example parsing @{text "h"}, @{text "e"} and @{text "l"} (in this 
    88   sequence you can achieve by:
    88   order) you can achieve by:
    89 
    89 
    90   @{ML_response [display,gray] "(($$ \"h\") -- ($$ \"e\") -- ($$ \"l\")) (explode \"hello\")"
    90   @{ML_response [display,gray] "(($$ \"h\") -- ($$ \"e\") -- ($$ \"l\")) (explode \"hello\")"
    91                           "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"}
    91                           "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"}
    92 
    92 
    93   Note how the result of consumed strings builds up on the left as nested pairs.  
    93   Note how the result of consumed strings builds up on the left as nested pairs.  
    97 
    97 
    98   @{ML_response [display,gray] "Scan.this_string \"hell\" (explode \"hello\")"
    98   @{ML_response [display,gray] "Scan.this_string \"hell\" (explode \"hello\")"
    99                           "(\"hell\", [\"o\"])"}
    99                           "(\"hell\", [\"o\"])"}
   100 
   100 
   101   Parsers that explore alternatives can be constructed using the function @{ML
   101   Parsers that explore alternatives can be constructed using the function @{ML
   102   "||"}. For example, the parser @{ML "(p || q)" for p q} returns the
   102   "||"}. The parser @{ML "(p || q)" for p q} returns the
   103   result of @{text "p"}, in case it succeeds, otherwise it returns the
   103   result of @{text "p"}, in case it succeeds, otherwise it returns the
   104   result of @{text "q"}. For example:
   104   result of @{text "q"}. For example:
   105 
   105 
   106 
   106 
   107 @{ML_response [display,gray] 
   107 @{ML_response [display,gray] 
   153 in 
   153 in 
   154   (p input1, p input2)
   154   (p input1, p input2)
   155 end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}
   155 end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"}
   156 
   156 
   157   The function @{ML "!!"} helps to produce appropriate error messages
   157   The function @{ML "!!"} helps to produce appropriate error messages
   158   during parsing. For example if you want to parse that @{text p} is immediately 
   158   for parsing. For example if you want to parse @{text p} immediately 
   159   followed by @{text q}, or start a completely different parser @{text r},
   159   followed by @{text q}, or start a completely different parser @{text r},
   160   you might write:
   160   you might write:
   161 
   161 
   162   @{ML [display,gray] "(p -- q) || r" for p q r}
   162   @{ML [display,gray] "(p -- q) || r" for p q r}
   163 
   163 
   164   However, this parser is problematic for producing an appropriate error
   164   However, this parser is problematic for producing an appropriate error
   165   message, in case the parsing of @{ML "(p -- q)" for p q} fails. Because in
   165   message, if the parsing of @{ML "(p -- q)" for p q} fails. Because in that
   166   that case you lose the information that @{text p} should be followed by
   166   case you lose the information that @{text p} should be followed by @{text q}.
   167   @{text q}. To see this consider the case in which @{text p} is present in
   167   To see this assume that @{text p} is present in the input, but it is not
   168   the input, but not @{text q}. That means @{ML "(p -- q)" for p q} will fail
   168   followed by @{text q}. That means @{ML "(p -- q)" for p q} will fail and
   169   and the alternative parser @{text r} will be tried. However in many
   169   hence the alternative parser @{text r} will be tried. However, in many
   170   circumstance this will be the wrong parser for the input ``p-followed-by-q''
   170   circumstances this will be the wrong parser for the input ``p-followed-by-something''
   171   and therefore will also fail. The error message is then caused by the
   171   and therefore will also fail. The error message is then caused by the failure
   172   failure of @{text r}, not by the absence of @{text q} in the input. This
   172   of @{text r}, not by the absence of @{text q} in the input. This kind of
   173   kind of situation can be avoided when using the function @{ML "!!"}. 
   173   situation can be avoided when using the function @{ML "!!"}.  This function
   174   This function aborts the whole process of parsing in case of a
   174   aborts the whole process of parsing in case of a failure and prints an error
   175   failure and prints an error message. For example if you invoke the parser
   175   message. For example if you invoke the parser
   176 
   176 
   177   
   177   
   178   @{ML [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\"))"}
   178   @{ML [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\"))"}
   179 
   179 
   180   on @{text [quotes] "hello"}, the parsing succeeds
   180   on @{text [quotes] "hello"}, the parsing succeeds
   210   ($$ p -- (!! err_msg ($$ q))) || ($$ r -- $$ r)
   210   ($$ p -- (!! err_msg ($$ q))) || ($$ r -- $$ r)
   211 end *}
   211 end *}
   212 
   212 
   213 
   213 
   214 text {*
   214 text {*
   215   Running this parser with the @{text [quotes] "h"} and @{text [quotes] "e"}, and 
   215   Running this parser with the arguments
       
   216   @{text [quotes] "h"}, @{text [quotes] "e"} and @{text [quotes] "w"}, and 
   216   the input @{text [quotes] "holle"} 
   217   the input @{text [quotes] "holle"} 
   217 
   218 
   218   @{ML_response_fake [display,gray] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"holle\")"
   219   @{ML_response_fake [display,gray] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"holle\")"
   219                                "Exception ERROR \"h is not followed by e\" raised"} 
   220                                "Exception ERROR \"h is not followed by e\" raised"} 
   220 
   221 
   289    Scan.finite Symbol.stopper p input2)
   290    Scan.finite Symbol.stopper p input2)
   290 end"
   291 end"
   291 "(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []),
   292 "(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []),
   292  ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"}
   293  ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"}
   293 
   294 
   294   After parsing is done, you nearly always want to apply a function on the parsed 
   295   After parsing is done, you almost always want to apply a function to the parsed 
   295   items. One way to do this is the function @{ML "(p >> f)" for p f}, which runs 
   296   items. One way to do this is the function @{ML "(p >> f)" for p f}, which runs 
   296   first the parser @{text p} and upon successful completion applies the 
   297   first the parser @{text p} and upon successful completion applies the 
   297   function @{text f} to the result. For example
   298   function @{text f} to the result. For example
   298 
   299 
   299 @{ML_response [display,gray]
   300 @{ML_response [display,gray]
   337 
   338 
   338   (FIXME: In which situations is this useful? Give examples.) 
   339   (FIXME: In which situations is this useful? Give examples.) 
   339 
   340 
   340   \begin{exercise}\label{ex:scancmts}
   341   \begin{exercise}\label{ex:scancmts}
   341   Write a parser that parses an input string so that any comment enclosed
   342   Write a parser that parses an input string so that any comment enclosed
   342   inside @{text "(*\<dots>*)"} is replaced by a the same comment but enclosed inside
   343   within @{text "(*\<dots>*)"} is replaced by the same comment but enclosed within
   343   @{text "(**\<dots>**)"} in the output string. To enclose a string, you can use the
   344   @{text "(**\<dots>**)"} in the output string. To enclose a string, you can use the
   344   function @{ML "enclose s1 s2 s" for s1 s2 s} which produces the string @{ML
   345   function @{ML "enclose s1 s2 s" for s1 s2 s} which produces the string @{ML
   345   "s1 ^ s ^ s2" for s1 s2 s}.
   346   "s1 ^ s ^ s2" for s1 s2 s}.
   346   \end{exercise}
   347   \end{exercise}
   347 *}
   348 *}
   518 
   519 
   519   (FIXME: or give parser for numbers)
   520   (FIXME: or give parser for numbers)
   520 
   521 
   521   Whenever there is a possibility that the processing of user input can fail, 
   522   Whenever there is a possibility that the processing of user input can fail, 
   522   it is a good idea to give as much information about where the error 
   523   it is a good idea to give as much information about where the error 
   523   occured. For this Isabelle can attach positional information to tokens
   524   occurred. For this Isabelle can attach positional information to tokens
   524   and then thread this information up the processing chain. To see this,
   525   and then thread this information up the processing chain. To see this,
   525   modify the function @{ML filtered_input} described earlier to 
   526   modify the function @{ML filtered_input} described earlier to 
   526 *}
   527 *}
   527 
   528 
   528 ML{*fun filtered_input' str = 
   529 ML{*fun filtered_input' str = 
   613 section {* Parsing Specifications\label{sec:parsingspecs} *}
   614 section {* Parsing Specifications\label{sec:parsingspecs} *}
   614 
   615 
   615 text {*
   616 text {*
   616   There are a number of special purpose parsers that help with parsing
   617   There are a number of special purpose parsers that help with parsing
   617   specifications of function definitions, inductive predicates and so on. In
   618   specifications of function definitions, inductive predicates and so on. In
   618   Capter~\ref{chp:package}, for example, we will need to parse specifications
   619   Chapter~\ref{chp:package}, for example, we will need to parse specifications
   619   for inductive predicates of the form:
   620   for inductive predicates of the form:
   620 *}
   621 *}
   621 
   622 
   622 simple_inductive
   623 simple_inductive
   623   even and odd
   624   even and odd
   689 text {*
   690 text {*
   690   Whenever types are given, they are stored in the @{ML SOME}s. The types are
   691   Whenever types are given, they are stored in the @{ML SOME}s. The types are
   691   not yet used to type the variables: this must be done by type-inference later
   692   not yet used to type the variables: this must be done by type-inference later
   692   on. Since types are part of the inner syntax they are strings with some
   693   on. Since types are part of the inner syntax they are strings with some
   693   encoded information (see previous section). If a syntax translation is
   694   encoded information (see previous section). If a syntax translation is
   694   present for a variable, then it is stored in the @{ML Mixfix} datastructure;
   695   present for a variable, then it is stored in the @{ML Mixfix} data structure;
   695   no syntax translation is indicated by @{ML NoSyn}.
   696   no syntax translation is indicated by @{ML NoSyn}.
   696 
   697 
   697   \begin{readmore}
   698   \begin{readmore}
   698   The datastructre for sytax annotations is defined in @{ML_file "Pure/Syntax/mixfix.ML"}.
   699   The data structure for syntax annotations is defined in @{ML_file "Pure/Syntax/mixfix.ML"}.
   699   \end{readmore}
   700   \end{readmore}
   700 
   701 
   701   Lines 3 to 7 in the function @{ML spec_parser} implement the parser for a
   702   Lines 3 to 7 in the function @{ML spec_parser} implement the parser for a
   702   list of introduction rules, that is propositions with theorem
   703   list of introduction rules, that is propositions with theorem
   703   annotations. The introduction rules are propositions parsed by @{ML
   704   annotations. The introduction rules are propositions parsed by @{ML
  1281   \texttt{SpecParse.token} are all the same).
  1282   \texttt{SpecParse.token} are all the same).
  1282   
  1283   
  1283   Input text is split up into tokens, and the input source type for many parsing
  1284   Input text is split up into tokens, and the input source type for many parsing
  1284   functions is \texttt{token list}.
  1285   functions is \texttt{token list}.
  1285 
  1286 
  1286   The datatype definition (which is not published in the signature) is
  1287   The data type definition (which is not published in the signature) is
  1287   \begin{verbatim}
  1288   \begin{verbatim}
  1288   datatype token = Token of Position.T * (token_kind * string);
  1289   datatype token = Token of Position.T * (token_kind * string);
  1289   \end{verbatim}
  1290   \end{verbatim}
  1290   but here are some runnable examples for viewing tokens: 
  1291   but here are some runnable examples for viewing tokens: 
  1291 
  1292 
  1466 
  1467 
  1467 section{* The \texttt{Args} structure *}
  1468 section{* The \texttt{Args} structure *}
  1468 
  1469 
  1469 text {*
  1470 text {*
  1470   The source file is \texttt{src/Pure/Isar/args.ML}.
  1471   The source file is \texttt{src/Pure/Isar/args.ML}.
  1471   The primary type of this structure is the \texttt{src} datatype;
  1472   The primary type of this structure is the \texttt{src} data type;
  1472   the single constructors not published in the signature, but 
  1473   the single constructors not published in the signature, but 
  1473   \texttt{Args.src} and \texttt{Args.dest\_src}
  1474   \texttt{Args.src} and \texttt{Args.dest\_src}
  1474   are in fact the constructor and destructor functions.
  1475   are in fact the constructor and destructor functions.
  1475   Note that the types \texttt{Attrib.src} and \texttt{Method.src}
  1476   Note that the types \texttt{Attrib.src} and \texttt{Method.src}
  1476   are in fact \texttt{Args.src}.
  1477   are in fact \texttt{Args.src}.