handouts/ho06.tex
changeset 594 d40d7d7b85bc
parent 593 bb24d4e207b6
child 595 4bf0096bc06b
equal deleted inserted replaced
593:bb24d4e207b6 594:d40d7d7b85bc
    25 The general idea behind parser combinators is to transform the input
    25 The general idea behind parser combinators is to transform the input
    26 into sets of pairs, like so
    26 into sets of pairs, like so
    27 
    27 
    28 \begin{center}
    28 \begin{center}
    29 $\underbrace{\text{list of tokens}}_{\text{input}}$ 
    29 $\underbrace{\text{list of tokens}}_{\text{input}}$ 
    30 $\Rightarrow$
    30 $\quad\Rightarrow\quad$
    31 $\underbrace{\text{set of (parsed part, unprocessed part)}}_{\text{output}}$
    31 $\underbrace{\text{set of (parsed part, unprocessed part)}}_{\text{output}}$
    32 \end{center} 
    32 \end{center} 
    33 
    33 
    34 \noindent
    34 \noindent
    35 Given the extended effort we have spent implementing a lexer in order
    35 Given the extended effort we have spent implementing a lexer in order
    80 parser cannot recognise anything from the input at all, then parser
    80 parser cannot recognise anything from the input at all, then parser
    81 combinators just return the empty set $\{\}$. This will indicate
    81 combinators just return the empty set $\{\}$. This will indicate
    82 something ``went wrong''\ldots or more precisely, nothing could be
    82 something ``went wrong''\ldots or more precisely, nothing could be
    83 parsed.
    83 parsed.
    84 
    84 
    85 Also important to note is that the type \texttt{T} for the processed
    85 Also important to note is that the output type \texttt{T} for the
    86 part is different from the input type \texttt{I} in the parse. In the
    86 processed part can potentially be different from the input type
    87 example above is just happens to be the same. The reason for the
    87 \texttt{I} in the parser. In the example above is just happens to be
    88 difference is that in general we are interested in
    88 the same. The reason for the difference is that in general we are
    89 transforming our input into something ``different''\ldots for example
    89 interested in transforming our input into something
    90 into a tree; or if we implement the grammar for arithmetic
    90 ``different''\ldots for example into a tree; or if we implement the
    91 expressions, we might be interested in the actual integer number the
    91 grammar for arithmetic expressions, we might be interested in the
    92 arithmetic expression, say \texttt{1 + 2 * 3}, stands for. In this way
    92 actual integer number the arithmetic expression, say \texttt{1 + 2 *
    93 we can use parser combinators to implement relatively easily a
    93   3}, stands for. In this way we can use parser combinators to
    94 calculator, for instance.
    94 implement relatively easily a calculator, for instance (we shall do
    95 
    95 this later on).
    96 The main idea of parser combinators is that we can easily build parser
    96 
    97 combinators out of smaller components following very closely the
    97 The main driving force behind parser combinators is that we can easily
    98 structure of a grammar. In order to implement this in a
    98 build parser combinators out of smaller components following very
       
    99 closely the structure of a grammar. In order to implement this in a
    99 functional/object-oriented programming language, like Scala, we need
   100 functional/object-oriented programming language, like Scala, we need
   100 to specify an abstract class for parser combinators. In the abstract
   101 to specify an abstract class for parser combinators. In the abstract
   101 class we specify that \texttt{I} is the \emph{input type} of the
   102 class we specify that \texttt{I} is the \emph{input type} of the
   102 parser combinator and that \texttt{T} is the \emph{output type}.  This
   103 parser combinator and that \texttt{T} is the \emph{output type}.  This
   103 implies that the function \texttt{parse} takes an argument of type
   104 implies that the function \texttt{parse} takes an argument of type
   876   ("(" ~ E ~ ")") ==> { case ((x, y), z) => y } | NumParserInt
   877   ("(" ~ E ~ ")") ==> { case ((x, y), z) => y } | NumParserInt
   877 \end{lstlisting}
   878 \end{lstlisting}
   878 \end{center}
   879 \end{center}
   879 
   880 
   880 \noindent
   881 \noindent
   881 Let us try out on some examples:
   882 Let us try out some examples:
   882 
   883 
   883 \begin{center}
   884 \begin{center}
   884 \begin{tabular}{rcl}
   885 \begin{tabular}{rcl}
   885   input strings & & output of \pcode{parse_all}\medskip\\
   886   input strings & & output of \pcode{parse_all}\medskip\\
   886   \texttt{\Grid{1+2+3}} & $\rightarrow$ & \texttt{Set(6)}\\
   887   \texttt{\Grid{1+2+3}} & $\rightarrow$ & \texttt{Set(6)}\\
   887   \texttt{\Grid{4*2+3}} & $\rightarrow$ & \texttt{Set(11)}\\
   888   \texttt{\Grid{4*2+3}} & $\rightarrow$ & \texttt{Set(11)}\\
   888   \texttt{\Grid{4*(2+3)}} & $\rightarrow$ & \texttt{Set(20)}\\
   889   \texttt{\Grid{4*(2+3)}} & $\rightarrow$ & \texttt{Set(20)}\\
       
   890   \texttt{\Grid{(4)*((2+3))}} & $\rightarrow$ & \texttt{Set(20)}\\
   889   \texttt{\Grid{4/2+3}} & $\rightarrow$ & \texttt{Set()}\\
   891   \texttt{\Grid{4/2+3}} & $\rightarrow$ & \texttt{Set()}\\
   890   \texttt{\Grid{1\VS +\VS 2\VS +\VS 3}} & $\rightarrow$ & \texttt{Set()}\\                      
   892   \texttt{\Grid{1\VS +\VS 2\VS +\VS 3}} & $\rightarrow$ & \texttt{Set()}\\                      
   891 \end{tabular}
   893 \end{tabular}
   892 \end{center}
   894 \end{center}
   893 
   895 
   894 \noindent
   896 \noindent
   895 All examples should be quite self-explanatory: the last two do not
   897 Note that we call \pcode{parse_all}, not \pcode{parse}.  The examples
   896 produce any result because our parser did not define what to do in
   898 should be quite self-explanatory. The last two example do not produce
   897 case of division (could be easily added) but also has no idea what to
   899 any integer result because our parser does not define what to do in
   898 do with whitescpaces. To deal with them is the task of the lexer. We
   900 case of division (could be easily added), but also has no idea what to
   899 can deal with them inside the grammar, but that would render many
   901 do with whitescpaces. To deal with them is the task of the lexer! Yes,
   900 grammars becoming unintelligible.
   902 we can deal with them inside the grammar, but that would render many
       
   903 grammars becoming unintelligible, including this one.\footnote{If you
       
   904   think an easy solution is to extend the notion of what a number
       
   905   should be, then think again---you still would have to deal with
       
   906   cases like \texttt{\Grid{(\VS (\VS 2+3)\VS )}}. Jusat think you have
       
   907   a grammar for a full-blown language where there are numerous such cases.}
   901 
   908 
   902 \end{document}
   909 \end{document}
   903 
   910 
   904 %%% Local Variables: 
   911 %%% Local Variables: 
   905 %%% mode: latex  
   912 %%% mode: latex