540 \draw ( 0.5,0.5) node {$N,V$}; |
541 \draw ( 0.5,0.5) node {$N,V$}; |
541 \draw ( 1.5,0.5) node {$A$}; |
542 \draw ( 1.5,0.5) node {$A$}; |
542 \draw ( 2.5,0.5) node {$N$}; |
543 \draw ( 2.5,0.5) node {$N$}; |
543 \draw ( 3.5,0.5) node {$N,V$}; |
544 \draw ( 3.5,0.5) node {$N,V$}; |
544 |
545 |
|
546 % \draw (-1.5,1.5) node {\small{}a}; |
|
547 % \draw (-0.5,1.5) node {\small{}b}; |
|
548 % \draw ( 0.5,1.5) node {\small{}c}; |
|
549 % \draw ( 1.5,1.5) node {\small{}d}; |
|
550 % \draw ( 2.5,1.5) node {\small{}e}; |
|
551 |
545 \draw (-2.4, 5.5) node {$1$}; |
552 \draw (-2.4, 5.5) node {$1$}; |
546 \draw (-2.4, 4.5) node {$2$}; |
553 \draw (-2.4, 4.5) node {$2$}; |
547 \draw (-2.4, 3.5) node {$3$}; |
554 \draw (-2.4, 3.5) node {$3$}; |
548 \draw (-2.4, 2.5) node {$4$}; |
555 \draw (-2.4, 2.5) node {$4$}; |
549 \draw (-2.4, 1.5) node {$5$}; |
556 \draw (-2.4, 1.5) node {$5$}; |
550 \draw (-2.4, 0.5) node {$6$}; |
557 \draw (-2.4, 0.5) node {$6$}; |
551 \end{tikzpicture} |
558 \end{tikzpicture} |
552 \end{center} |
559 \end{center} |
553 \end{figure} |
560 %%\end{figure} |
|
561 |
|
562 |
|
563 \noindent |
|
564 The last row contains the information about all words and their |
|
565 corresponding non-terminals. For example the field for \texttt{trains} |
|
566 contains the information $\meta{N}$ and $\meta{V}$ because it can be a |
|
567 ``verb'' and a ``noun'' according to the grammar. The row above, |
|
568 let's call the corresponding fields 5a to 5e, contains information |
|
569 about 2-word parts of the sentence, namely |
|
570 |
|
571 \begin{center} |
|
572 \begin{tabular}{llll} |
|
573 5a) & $\underbrace{\texttt{The}}_{A}$ $\mid$ $\underbrace{\texttt{trainer}}_{N}$ \\ |
|
574 5b) & $\underbrace{\texttt{trainer}}_{N}$ $\mid$ $\underbrace{\texttt{trains}}_{N,V}$\\ |
|
575 5c) & \texttt{trains} $\mid$ \texttt{the} \\ |
|
576 5d) & \texttt{the} $\mid$ \texttt{student} \\ |
|
577 5e) & \texttt{student} $\mid$ \texttt{team} \\ |
|
578 \end{tabular} |
|
579 \end{center} |
|
580 |
|
581 \noindent |
|
582 For each of them, we look up in row 6 which non-terminals it belongs to |
|
583 (indicated above for 5a and 5b). For 5a, with the non-terminals |
|
584 \meta{A} and \meta{N}, we find the grammar rule |
|
585 |
|
586 \begin{plstx}[margin=1cm] |
|
587 : \meta{N} ::= \meta{A}\cdot \meta{N}\\ |
|
588 \end{plstx} |
|
589 |
|
590 \noindent |
|
591 which means into field 5a we put the left-hand side of this rule, |
|
592 which in this case is the non-terminal \meta{N}. For 5b we have to check |
|
593 for both $\meta{N}\cdot\meta{N}$ and $\meta{N}\cdot\meta{V}$ whether there |
|
594 is a right-hand side of this form in the grammar. But only the |
|
595 grammar rule |
|
596 |
|
597 \begin{plstx}[margin=1cm] |
|
598 : \meta{N} ::= \meta{N}\cdot \meta{N}\\ |
|
599 \end{plstx} |
|
600 |
|
601 \noindent |
|
602 matches, which means 5b gets also an \meta{N}. Continuing for all |
|
603 fields in row 5 gives: |
|
604 |
|
605 \begin{center} |
|
606 \begin{tikzpicture}[scale=0.7,line width=0.8mm] |
|
607 \draw (-2,0) -- (4,0); |
|
608 \draw (-2,1) -- (4,1); |
|
609 \draw (-2,2) -- (3,2); |
|
610 \draw (-2,3) -- (2,3); |
|
611 \draw (-2,4) -- (1,4); |
|
612 \draw (-2,5) -- (0,5); |
|
613 \draw (-2,6) -- (-1,6); |
|
614 |
|
615 \draw (0,0) -- (0, 5); |
|
616 \draw (1,0) -- (1, 4); |
|
617 \draw (2,0) -- (2, 3); |
|
618 \draw (3,0) -- (3, 2); |
|
619 \draw (4,0) -- (4, 1); |
|
620 \draw (-1,0) -- (-1, 6); |
|
621 \draw (-2,0) -- (-2, 6); |
|
622 |
|
623 \draw (-1.5,-0.5) node {\footnotesize{}\texttt{The}}; |
|
624 \draw (-0.5,-1.0) node {\footnotesize{}\texttt{trainer}}; |
|
625 \draw ( 0.5,-0.5) node {\footnotesize{}\texttt{trains}}; |
|
626 \draw ( 1.5,-1.0) node {\footnotesize{}\texttt{the}}; |
|
627 \draw ( 2.5,-0.5) node {\footnotesize{}\texttt{student}}; |
|
628 \draw ( 3.5,-1.0) node {\footnotesize{}\texttt{team}}; |
|
629 |
|
630 \draw (-1.5,0.5) node {$A$}; |
|
631 \draw (-0.5,0.5) node {$N$}; |
|
632 \draw ( 0.5,0.5) node {$N,V$}; |
|
633 \draw ( 1.5,0.5) node {$A$}; |
|
634 \draw ( 2.5,0.5) node {$N$}; |
|
635 \draw ( 3.5,0.5) node {$N,V$}; |
|
636 |
|
637 \draw (-1.5,1.5) node {$N$}; |
|
638 \draw (-0.5,1.5) node {$N$}; |
|
639 \draw ( 0.5,1.5) node {$$}; |
|
640 \draw ( 1.5,1.5) node {$N$}; |
|
641 \draw ( 2.5,1.5) node {$N$}; |
|
642 |
|
643 |
|
644 % \draw (-1.5,1.5) node {\small{}a}; |
|
645 % \draw (-0.5,1.5) node {\small{}b}; |
|
646 % \draw ( 0.5,1.5) node {\small{}c}; |
|
647 % \draw ( 1.5,1.5) node {\small{}d}; |
|
648 % \draw ( 2.5,1.5) node {\small{}e}; |
|
649 |
|
650 \draw (-2.4, 5.5) node {$1$}; |
|
651 \draw (-2.4, 4.5) node {$2$}; |
|
652 \draw (-2.4, 3.5) node {$3$}; |
|
653 \draw (-2.4, 2.5) node {$4$}; |
|
654 \draw (-2.4, 1.5) node {$5$}; |
|
655 \draw (-2.4, 0.5) node {$6$}; |
|
656 \end{tikzpicture} |
|
657 \end{center} |
|
658 |
|
659 \noindent |
|
660 Now row 4 is in charge of all 3-word parts of the sentence, namely |
|
661 |
|
662 \begin{center} |
|
663 \begin{tabular}{llll} |
|
664 4a) & The $\mid$ trainer trains\\ |
|
665 & The trainer $\mid$ trains\\ |
|
666 4b) & trainer $\mid$ trains the\\ |
|
667 & trainer trains $\mid$ the\\ |
|
668 4c) & trains $\mid$ the student\\ |
|
669 & trains the $\mid$ student\\ |
|
670 4d) & the $\mid$ student team\\ |
|
671 & the student $\mid$ team\\ |
|
672 \end{tabular} |
|
673 \end{center} |
|
674 |
|
675 \noindent |
|
676 Note that in case of 3-word parts we have two splits. For example for |
|
677 4a: $\underbrace{\texttt{The}}_{A}$ and |
|
678 $\underbrace{\texttt{trainer trains}}_{N}$; and also |
|
679 $\underbrace{\texttt{The trainer}}_{N}$ and |
|
680 $\underbrace{\texttt{trains}}_{N,V}$. For each of these splits we have |
|
681 to look up in the rows below, which non-terminals we already |
|
682 computed. This allows us to look for right-hand sides in our grammar: |
|
683 $\meta{A}\cdot\meta{N}$, $\meta{N}\cdot\meta{N}$ and |
|
684 $\meta{N}\cdot\meta{V}$, which yield the only left-hand side |
|
685 \meta{N}. This is what we fill in for 4a. And so on for row 4. |
|
686 |
|
687 Row 3 is about all 4-word parts in the sentence, namely |
|
688 |
|
689 \begin{center} |
|
690 \begin{tabular}{llll} |
|
691 3a) & The trainer trains the\\ |
|
692 3b) & trainer trains the student\\ |
|
693 3c) & trains the student team\\ |
|
694 \end{tabular} |
|
695 \end{center} |
|
696 |
|
697 \noindent |
|
698 Each of them can be split up in 3 ways, for example |
|
699 |
|
700 \begin{center} |
|
701 \begin{tabular}{llll} |
|
702 3a) & The $\mid$ trainer trains the\\ |
|
703 & The trainer $\mid$ trains the\\ |
|
704 & The trainer trains $\mid$ the\\ |
|
705 \end{tabular} |
|
706 \end{center} |
|
707 |
|
708 \noindent |
|
709 and we have to consider them all in turn to fill in the non-terminals for |
|
710 3a. You can guess how it continues: row 2 is for all 5-word parts, and finally |
|
711 the field on the top is for the whole sentence. |
|
712 The idea of the CYK algorithm is that if in the top-field the starting |
|
713 symbol \meta{S} appears (possibly together with other non-terminals), |
|
714 then the sentence is accepted by the grammar. If it does not, then the |
|
715 sentence is not accepted. |
|
716 |
|
717 Let us very quickly calculate the complexity of the CYK |
|
718 algorithm. Lookup operations inside the triangle and in the grammar |
|
719 are assumed to be of constant time, $O(1)$, meaning they do not |
|
720 matter. How many fields are in the triangle\ldots |
|
721 $\frac{n}{2} * (n+1)$, where $n$ is the size of the input. That means |
|
722 roughly $O(n^2)$ fields. How much work do we have to do for each |
|
723 field? Well, for the top-most we have to consider $n-1$ splits, which |
|
724 means roughly $O(n)$ for each field. The overall result is a $O(n^3)$ |
|
725 time-complexity for CYK. It turns out that this is the best worst-time |
|
726 complexity for parsing with context-free grammars. |
554 |
727 |
555 \end{document} |
728 \end{document} |
556 |
729 |
557 |
730 |
558 %%% Parser combinators are now part of handout 6 |
731 %%% Parser combinators are now part of handout 6 |