457 complexity for parsing? It turns out that this is $O(n^3)$ for context-free |
457 complexity for parsing? It turns out that this is $O(n^3)$ for context-free |
458 languages. |
458 languages. |
459 |
459 |
460 To answer the question about complexity, let me describe next the CYK |
460 To answer the question about complexity, let me describe next the CYK |
461 algorithm (named after the authors Cocke–Younger–Kasami). This algorithm |
461 algorithm (named after the authors Cocke–Younger–Kasami). This algorithm |
462 works with grammars that are in Chomsky normalform. |
462 works with grammars that are in \emph{Chomsky normalform}. In Chomsky |
|
463 normalform all rules must be of the form $\meta{A} ::= a$, where $a$ is |
|
464 a terminal, or $\meta{A} ::= \meta{B}\cdot \meta{C}$, where $\meta{B}$ and |
|
465 $\meta{B}$ need to be non-terminals. And no rule can contain $\epsilon$. |
|
466 The following grammar is in Chomsky normalform: |
|
467 |
|
468 \begin{plstx}[margin=1cm] |
|
469 : \meta{S\/} ::= \meta{N}\cdot \meta{P}\\ |
|
470 : \meta{P\/} ::= \meta{V}\cdot \meta{N}\\ |
|
471 : \meta{N\/} ::= \meta{N}\cdot \meta{N}\\ |
|
472 : \meta{N\/} ::= \meta{A}\cdot \meta{N}\\ |
|
473 : \meta{N\/} ::= \texttt{student} | \texttt{trainer} | \texttt{team} |
|
474 | \texttt{trains}\\ |
|
475 : \meta{V\/} ::= \texttt{trains} | \texttt{team}\\ |
|
476 : \meta{A\/} ::= \texttt{The} | \texttt{the}\\ |
|
477 \end{plstx} |
|
478 |
|
479 \noindent |
|
480 where $\meta{S}$ is the start symbol and $\meta{S}$, $\meta{P}$, |
|
481 $\meta{N}$, $\meta{V}$ and $\meta{A}$ are non-terminals. The ``words'' |
|
482 are terminals. The rough idea behind this grammar is that $\meta{S}$ |
|
483 stands for a sentence, $\meta{P}$ is a predicate, $\meta{N}$ is a noun |
|
484 and so on. For example the rule \mbox{$\meta{P} ::= \meta{V}\cdot |
|
485 \meta{N}$} states that a predicate can be a verb followed by a noun. |
|
486 Now the question is whether the string |
|
487 |
|
488 \begin{center} |
|
489 \texttt{The trainer trains the student team} |
|
490 \end{center} |
|
491 |
|
492 \noindent |
|
493 is recognised by the grammar. The CYK algorithm starts with the |
|
494 following triangular data structure. |
463 |
495 |
464 TBD |
496 TBD |
465 |
497 |
466 \end{document} |
498 \end{document} |
467 |
499 |