25 The general idea behind parser combinators is to transform the input |
25 The general idea behind parser combinators is to transform the input |
26 into sets of pairs, like so |
26 into sets of pairs, like so |
27 |
27 |
28 \begin{center} |
28 \begin{center} |
29 $\underbrace{\text{list of tokens}}_{\text{input}}$ |
29 $\underbrace{\text{list of tokens}}_{\text{input}}$ |
30 $\Rightarrow$ |
30 $\quad\Rightarrow\quad$ |
31 $\underbrace{\text{set of (parsed part, unprocessed part)}}_{\text{output}}$ |
31 $\underbrace{\text{set of (parsed part, unprocessed part)}}_{\text{output}}$ |
32 \end{center} |
32 \end{center} |
33 |
33 |
34 \noindent |
34 \noindent |
35 Given the extended effort we have spent implementing a lexer in order |
35 Given the extended effort we have spent implementing a lexer in order |
80 parser cannot recognise anything from the input at all, then parser |
80 parser cannot recognise anything from the input at all, then parser |
81 combinators just return the empty set $\{\}$. This will indicate |
81 combinators just return the empty set $\{\}$. This will indicate |
82 something ``went wrong''\ldots or more precisely, nothing could be |
82 something ``went wrong''\ldots or more precisely, nothing could be |
83 parsed. |
83 parsed. |
84 |
84 |
85 Also important to note is that the type \texttt{T} for the processed |
85 Also important to note is that the output type \texttt{T} for the |
86 part is different from the input type \texttt{I} in the parse. In the |
86 processed part can potentially be different from the input type |
87 example above is just happens to be the same. The reason for the |
87 \texttt{I} in the parser. In the example above is just happens to be |
88 difference is that in general we are interested in |
88 the same. The reason for the difference is that in general we are |
89 transforming our input into something ``different''\ldots for example |
89 interested in transforming our input into something |
90 into a tree; or if we implement the grammar for arithmetic |
90 ``different''\ldots for example into a tree; or if we implement the |
91 expressions, we might be interested in the actual integer number the |
91 grammar for arithmetic expressions, we might be interested in the |
92 arithmetic expression, say \texttt{1 + 2 * 3}, stands for. In this way |
92 actual integer number the arithmetic expression, say \texttt{1 + 2 * |
93 we can use parser combinators to implement relatively easily a |
93 3}, stands for. In this way we can use parser combinators to |
94 calculator, for instance. |
94 implement relatively easily a calculator, for instance (we shall do |
95 |
95 this later on). |
96 The main idea of parser combinators is that we can easily build parser |
96 |
97 combinators out of smaller components following very closely the |
97 The main driving force behind parser combinators is that we can easily |
98 structure of a grammar. In order to implement this in a |
98 build parser combinators out of smaller components following very |
|
99 closely the structure of a grammar. In order to implement this in a |
99 functional/object-oriented programming language, like Scala, we need |
100 functional/object-oriented programming language, like Scala, we need |
100 to specify an abstract class for parser combinators. In the abstract |
101 to specify an abstract class for parser combinators. In the abstract |
101 class we specify that \texttt{I} is the \emph{input type} of the |
102 class we specify that \texttt{I} is the \emph{input type} of the |
102 parser combinator and that \texttt{T} is the \emph{output type}. This |
103 parser combinator and that \texttt{T} is the \emph{output type}. This |
103 implies that the function \texttt{parse} takes an argument of type |
104 implies that the function \texttt{parse} takes an argument of type |
876 ("(" ~ E ~ ")") ==> { case ((x, y), z) => y } | NumParserInt |
877 ("(" ~ E ~ ")") ==> { case ((x, y), z) => y } | NumParserInt |
877 \end{lstlisting} |
878 \end{lstlisting} |
878 \end{center} |
879 \end{center} |
879 |
880 |
880 \noindent |
881 \noindent |
881 Let us try out on some examples: |
882 Let us try out some examples: |
882 |
883 |
883 \begin{center} |
884 \begin{center} |
884 \begin{tabular}{rcl} |
885 \begin{tabular}{rcl} |
885 input strings & & output of \pcode{parse_all}\medskip\\ |
886 input strings & & output of \pcode{parse_all}\medskip\\ |
886 \texttt{\Grid{1+2+3}} & $\rightarrow$ & \texttt{Set(6)}\\ |
887 \texttt{\Grid{1+2+3}} & $\rightarrow$ & \texttt{Set(6)}\\ |
887 \texttt{\Grid{4*2+3}} & $\rightarrow$ & \texttt{Set(11)}\\ |
888 \texttt{\Grid{4*2+3}} & $\rightarrow$ & \texttt{Set(11)}\\ |
888 \texttt{\Grid{4*(2+3)}} & $\rightarrow$ & \texttt{Set(20)}\\ |
889 \texttt{\Grid{4*(2+3)}} & $\rightarrow$ & \texttt{Set(20)}\\ |
|
890 \texttt{\Grid{(4)*((2+3))}} & $\rightarrow$ & \texttt{Set(20)}\\ |
889 \texttt{\Grid{4/2+3}} & $\rightarrow$ & \texttt{Set()}\\ |
891 \texttt{\Grid{4/2+3}} & $\rightarrow$ & \texttt{Set()}\\ |
890 \texttt{\Grid{1\VS +\VS 2\VS +\VS 3}} & $\rightarrow$ & \texttt{Set()}\\ |
892 \texttt{\Grid{1\VS +\VS 2\VS +\VS 3}} & $\rightarrow$ & \texttt{Set()}\\ |
891 \end{tabular} |
893 \end{tabular} |
892 \end{center} |
894 \end{center} |
893 |
895 |
894 \noindent |
896 \noindent |
895 All examples should be quite self-explanatory: the last two do not |
897 Note that we call \pcode{parse_all}, not \pcode{parse}. The examples |
896 produce any result because our parser did not define what to do in |
898 should be quite self-explanatory. The last two example do not produce |
897 case of division (could be easily added) but also has no idea what to |
899 any integer result because our parser does not define what to do in |
898 do with whitescpaces. To deal with them is the task of the lexer. We |
900 case of division (could be easily added), but also has no idea what to |
899 can deal with them inside the grammar, but that would render many |
901 do with whitescpaces. To deal with them is the task of the lexer! Yes, |
900 grammars becoming unintelligible. |
902 we can deal with them inside the grammar, but that would render many |
|
903 grammars becoming unintelligible, including this one.\footnote{If you |
|
904 think an easy solution is to extend the notion of what a number |
|
905 should be, then think again---you still would have to deal with |
|
906 cases like \texttt{\Grid{(\VS (\VS 2+3)\VS )}}. Jusat think you have |
|
907 a grammar for a full-blown language where there are numerous such cases.} |
901 |
908 |
902 \end{document} |
909 \end{document} |
903 |
910 |
904 %%% Local Variables: |
911 %%% Local Variables: |
905 %%% mode: latex |
912 %%% mode: latex |