# HG changeset patch # User griff # Date 1238590209 -7200 # Node ID 98f53ab3722e138d65e157d90a49f3f429f53c20 # Parent a9eb69749c935c525fabb3a1d4e6f8c73f0f2115# Parent 7ff7325e3b4eed6fc73e4275752fdf8f1fab469f * merge diff -r 7ff7325e3b4e -r 98f53ab3722e ProgTutorial/Parsing.thy --- a/ProgTutorial/Parsing.thy Tue Mar 31 20:31:18 2009 +0100 +++ b/ProgTutorial/Parsing.thy Wed Apr 01 14:50:09 2009 +0200 @@ -83,9 +83,9 @@ end" "((\"h\", [\"e\", \"l\", \"l\", \"o\"]),(\"w\", [\"o\", \"r\", \"l\", \"d\"]))"} - Two parser can be connected in sequence by using the function @{ML "--"}. - For example parsing @{text "h"}, @{text "e"} and @{text "l"} in this - sequence you can achieve by: + Two parsers can be connected in sequence by using the function @{ML "--"}. + For example parsing @{text "h"}, @{text "e"} and @{text "l"} (in this + order) you can achieve by: @{ML_response [display,gray] "(($$ \"h\") -- ($$ \"e\") -- ($$ \"l\")) (explode \"hello\")" "(((\"h\", \"e\"), \"l\"), [\"l\", \"o\"])"} @@ -99,7 +99,7 @@ "(\"hell\", [\"o\"])"} Parsers that explore alternatives can be constructed using the function @{ML - "||"}. For example, the parser @{ML "(p || q)" for p q} returns the + "||"}. The parser @{ML "(p || q)" for p q} returns the result of @{text "p"}, in case it succeeds, otherwise it returns the result of @{text "q"}. For example: @@ -155,24 +155,24 @@ end" "((SOME \"h\", [\"e\", \"l\", \"l\", \"o\"]), (NONE, [\"w\", \"o\", \"r\", \"l\", \"d\"]))"} The function @{ML "!!"} helps to produce appropriate error messages - during parsing. For example if you want to parse that @{text p} is immediately + for parsing. For example if you want to parse @{text p} immediately followed by @{text q}, or start a completely different parser @{text r}, you might write: @{ML [display,gray] "(p -- q) || r" for p q r} However, this parser is problematic for producing an appropriate error - message, in case the parsing of @{ML "(p -- q)" for p q} fails. Because in - that case you lose the information that @{text p} should be followed by - @{text q}. To see this consider the case in which @{text p} is present in - the input, but not @{text q}. That means @{ML "(p -- q)" for p q} will fail - and the alternative parser @{text r} will be tried. However in many - circumstance this will be the wrong parser for the input ``p-followed-by-q'' - and therefore will also fail. The error message is then caused by the - failure of @{text r}, not by the absence of @{text q} in the input. This - kind of situation can be avoided when using the function @{ML "!!"}. - This function aborts the whole process of parsing in case of a - failure and prints an error message. For example if you invoke the parser + message, if the parsing of @{ML "(p -- q)" for p q} fails. Because in that + case you lose the information that @{text p} should be followed by @{text q}. + To see this assume that @{text p} is present in the input, but it is not + followed by @{text q}. That means @{ML "(p -- q)" for p q} will fail and + hence the alternative parser @{text r} will be tried. However, in many + circumstances this will be the wrong parser for the input ``p-followed-by-something'' + and therefore will also fail. The error message is then caused by the failure + of @{text r}, not by the absence of @{text q} in the input. This kind of + situation can be avoided when using the function @{ML "!!"}. This function + aborts the whole process of parsing in case of a failure and prints an error + message. For example if you invoke the parser @{ML [display,gray] "(!! (fn _ => \"foo\") ($$ \"h\"))"} @@ -212,7 +212,8 @@ text {* - Running this parser with the @{text [quotes] "h"} and @{text [quotes] "e"}, and + Running this parser with the arguments + @{text [quotes] "h"}, @{text [quotes] "e"} and @{text [quotes] "w"}, and the input @{text [quotes] "holle"} @{ML_response_fake [display,gray] "Scan.error (p_followed_by_q \"h\" \"e\" \"w\") (explode \"holle\")" @@ -291,7 +292,7 @@ "(([\"f\", \"o\", \"o\", \"o\", \"o\", \"o\"], []), ([\"f\", \"o\", \"o\"], [\"*\", \"o\", \"o\", \"o\"]))"} - After parsing is done, you nearly always want to apply a function on the parsed + After parsing is done, you almost always want to apply a function to the parsed items. One way to do this is the function @{ML "(p >> f)" for p f}, which runs first the parser @{text p} and upon successful completion applies the function @{text f} to the result. For example @@ -339,7 +340,7 @@ \begin{exercise}\label{ex:scancmts} Write a parser that parses an input string so that any comment enclosed - inside @{text "(*\<dots>*)"} is replaced by a the same comment but enclosed inside + within @{text "(*\<dots>*)"} is replaced by the same comment but enclosed within @{text "(**\<dots>**)"} in the output string. To enclose a string, you can use the function @{ML "enclose s1 s2 s" for s1 s2 s} which produces the string @{ML "s1 ^ s ^ s2" for s1 s2 s}. @@ -389,13 +390,13 @@ produces three tokens where the first and the last are identifiers, since @{text [quotes] "hello"} and @{text [quotes] "world"} do not match any - other syntactic category.\footnote{Note that because of a possible a bug in - the PolyML runtime system the result is printed as @{text [quotes] "?"}, instead of + other syntactic category.\footnote{Note that because of a possible bug in + the PolyML runtime system, the result is printed as @{text [quotes] "?"}, instead of the tokens.} The second indicates a space. Many parsing functions later on will require spaces, comments and the like to have already been filtered out. So from now on we are going to use the - functions @{ML filter} and @{ML OuterLex.is_proper} do this. For example: + functions @{ML filter} and @{ML OuterLex.is_proper} to do this. For example: @{ML_response_fake [display,gray] "let @@ -422,7 +423,7 @@ Token (\<dots>,(Keyword, \"|\"),\<dots>), Token (\<dots>,(Keyword, \"for\"),\<dots>)]"} - you obtain a list consisting of only a command and two keyword tokens. + you obtain a list consisting of only one command and two keyword tokens. If you want to see which keywords and commands are currently known to Isabelle, type in the following code (you might have to adjust the @{ML print_depth} in order to see the complete list): @@ -497,7 +498,7 @@ parser in case of a dead end, just like @{ML "Scan.!!"} (see previous section). Except that the error message of @{ML "OuterParse.!!!"} is fixed to be @{text [quotes] "Outer syntax error"} - with a relatively precise description of the failure. For example: + together with a relatively precise description of the failure. For example: @{ML_response_fake [display,gray] "let @@ -519,8 +520,8 @@ (FIXME: or give parser for numbers) Whenever there is a possibility that the processing of user input can fail, - it is a good idea to give as much information about where the error - occured. For this Isabelle can attach positional information to tokens + it is a good idea to give all available information about where the error + occurred. For this Isabelle can attach positional information to tokens and then thread this information up the processing chain. To see this, modify the function @{ML filtered_input} described earlier to *} @@ -615,7 +616,7 @@ text {* There are a number of special purpose parsers that help with parsing specifications of function definitions, inductive predicates and so on. In - Capter~\ref{chp:package}, for example, we will need to parse specifications + Chapter~\ref{chp:package}, for example, we will need to parse specifications for inductive predicates of the form: *} @@ -691,11 +692,11 @@ not yet used to type the variables: this must be done by type-inference later on. Since types are part of the inner syntax they are strings with some encoded information (see previous section). If a syntax translation is - present for a variable, then it is stored in the @{ML Mixfix} datastructure; + present for a variable, then it is stored in the @{ML Mixfix} data structure; no syntax translation is indicated by @{ML NoSyn}. \begin{readmore} - The datastructre for sytax annotations is defined in @{ML_file "Pure/Syntax/mixfix.ML"}. + The data structure for syntax annotations is defined in @{ML_file "Pure/Syntax/mixfix.ML"}. \end{readmore} Lines 3 to 7 in the function @{ML spec_parser} implement the parser for a @@ -1014,7 +1015,7 @@ text {* (FIXME: maybe move to after the tactic section) - Methods are a central to Isabelle. They are the ones you use for example + Methods are central to Isabelle. They are the ones you use for example in \isacommand{apply}. To print out all currently known methods you can use the Isabelle command: @@ -1284,7 +1285,7 @@ Input text is split up into tokens, and the input source type for many parsing functions is \texttt{token list}. - The datatype definition (which is not published in the signature) is + The data type definition (which is not published in the signature) is \begin{verbatim} datatype token = Token of Position.T * (token_kind * string); \end{verbatim} @@ -1469,7 +1470,7 @@ text {* The source file is \texttt{src/Pure/Isar/args.ML}. - The primary type of this structure is the \texttt{src} datatype; + The primary type of this structure is the \texttt{src} data type; the single constructors not published in the signature, but \texttt{Args.src} and \texttt{Args.dest\_src} are in fact the constructor and destructor functions. @@ -1733,4 +1734,4 @@ *} (*>*) -end \ No newline at end of file +end diff -r 7ff7325e3b4e -r 98f53ab3722e progtutorial.pdf Binary file progtutorial.pdf has changed