421 \emph{not} a simple pair \texttt{(ab, cde)} as one might errorneously |
421 \emph{not} a simple pair \texttt{(ab, cde)} as one might errorneously |
422 expects. The parser returns the ampty set in the other examples, |
422 expects. The parser returns the ampty set in the other examples, |
423 because they do not fit with what the parser is supposed to parse. |
423 because they do not fit with what the parser is supposed to parse. |
424 |
424 |
425 |
425 |
426 A slightly more complicated parser is \pcode{("a" || "b") ~ "c"}. |
426 A slightly more complicated parser is \pcode{("a" || "b") ~ "c"} |
427 which parses as first character either an \texttt{a} or \texttt{b} |
427 which parses as first character either an \texttt{a} or \texttt{b} |
428 followed by a \texttt{c}. This parser produces the following outputs. |
428 followed by a \texttt{c}. This parser produces the following outputs. |
429 |
429 |
430 \begin{center} |
430 \begin{center} |
431 \begin{tabular}{rcl} |
431 \begin{tabular}{rcl} |
432 input strings & & output\medskip\\ |
432 input strings & & output\medskip\\ |
433 \texttt{\Grid{abc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\ |
433 \texttt{\Grid{acde}} & $\rightarrow$ & $\left\{((\texttt{\Grid{a}}, \texttt{\Grid{c}}), \texttt{\Grid{de}})\right\}$\\ |
434 \texttt{\Grid{bbc}} & $\rightarrow$ & $\left\{((\texttt{\Grid{b}}, \texttt{\Grid{b}}), \texttt{\Grid{c}})\right\}$\\ |
434 \texttt{\Grid{bcde}} & $\rightarrow$ & $\left\{((\texttt{\Grid{b}}, \texttt{\Grid{c}}), \texttt{\Grid{de}})\right\}$\\ |
435 \texttt{\Grid{aac}} & $\rightarrow$ & $\{\}$ |
435 \texttt{\Grid{abde}} & $\rightarrow$ & $\{\}$ |
436 \end{tabular} |
436 \end{tabular} |
437 \end{center} |
437 \end{center} |
438 |
438 |
439 \noindent Two more examples: first consider the parser \pcode{('a' ~ |
439 \noindent |
440 'a') ~ 'a'} and the input \pcode{aaaa}: |
440 Now consider the parser \pcode{("a" ~ "b") ~ "c"} which parses |
|
441 \texttt{a}, \texttt{b}, \texttt{c} in sequence. This parser produces |
|
442 the following outputs. |
|
443 |
|
444 \begin{center} |
|
445 \begin{tabular}{rcl} |
|
446 input strings & & output\medskip\\ |
|
447 \texttt{\Grid{abcde}} & $\rightarrow$ & $\left\{(((\texttt{\Grid{a}},\texttt{\Grid{b}}), \texttt{\Grid{c}}), \texttt{\Grid{de}})\right\}$\\ |
|
448 \texttt{\Grid{abde}} & $\rightarrow$ & $\{\}$\\ |
|
449 \texttt{\Grid{bcde}} & $\rightarrow$ & $\{\}$ |
|
450 \end{tabular} |
|
451 \end{center} |
|
452 |
|
453 |
|
454 \noindent The second and third example fail, because something is |
|
455 ``missing'' in the sequence we are looking for. Also notice how the |
|
456 results nest with sequences: the parsed part is a nested pair of the |
|
457 form \pcode{((a, b), c)}. Two more examples: first consider the parser |
|
458 \pcode{("a" ~ "a") ~ "a"} and the input \pcode{aaaa}: |
441 |
459 |
442 \begin{center} |
460 \begin{center} |
443 \begin{tabular}{rcl} |
461 \begin{tabular}{rcl} |
444 input string & & output\medskip\\ |
462 input string & & output\medskip\\ |
445 \texttt{\Grid{aaaa}} & $\rightarrow$ & |
463 \texttt{\Grid{aaaa}} & $\rightarrow$ & |
446 $\left\{(((\texttt{\Grid{a}}, \texttt{\Grid{a}}), \texttt{\Grid{a}}), \texttt{\Grid{a}})\right\}$\\ |
464 $\left\{(((\texttt{\Grid{a}}, \texttt{\Grid{a}}), \texttt{\Grid{a}}), \texttt{\Grid{a}})\right\}$\\ |
447 \end{tabular} |
465 \end{tabular} |
448 \end{center} |
466 \end{center} |
449 |
467 |
450 \noindent Notice how the results nest deeper and deeper as |
468 \noindent Notice how the results nests deeper and deeper as pairs (the |
451 pairs (the last \pcode{a} is in the unprocessed part). To |
469 last \pcode{a} is in the unprocessed part). To consume everything of |
452 consume everything of this string we can use the parser |
470 this string we can use the parser \pcode{(("a" ~ "a") ~ "a") ~ |
453 \pcode{(('a' ~'a') ~ 'a') ~ 'a'}. Then the output is as |
471 "a"}. Then the output is as follows: |
454 follows: |
|
455 |
472 |
456 \begin{center} |
473 \begin{center} |
457 \begin{tabular}{rcl} |
474 \begin{tabular}{rcl} |
458 input string & & output\medskip\\ |
475 input string & & output\medskip\\ |
459 \texttt{\Grid{aaaa}} & $\rightarrow$ & |
476 \texttt{\Grid{aaaa}} & $\rightarrow$ & |
461 \end{tabular} |
478 \end{tabular} |
462 \end{center} |
479 \end{center} |
463 |
480 |
464 \noindent This is an instance where the parser consumed |
481 \noindent This is an instance where the parser consumed |
465 completely the input, meaning the unprocessed part is just the |
482 completely the input, meaning the unprocessed part is just the |
466 empty string. |
483 empty string. So if we called \pcode{parse_all} instead of \pcode{parse} |
|
484 we would get back the result |
|
485 |
|
486 \[ |
|
487 \left\{(((\texttt{\Grid{a}}, \texttt{\Grid{a}}), \texttt{\Grid{a}}), \texttt{\Grid{a}})\right\} |
|
488 \] |
|
489 |
|
490 \noindent where the unprocessed (empty) parts have been stripped away |
|
491 from the pairs; everything where the second part was not empty has |
|
492 been thrown away as ultimately-unsuccessful-parses. |
467 |
493 |
468 |
494 |
469 Note carefully that constructing a parser such \pcode{'a' || |
495 Note carefully that constructing a parser such \pcode{'a' || |
470 ('a' ~ 'b')} will result in a typing error. The first |
496 ('a' ~ 'b')} will result in a typing error. The first |
471 parser has as output type a single character (recall the type |
497 parser has as output type a single character (recall the type |