handouts/ho05.tex
changeset 681 7b7736bea3ca
parent 680 eecc4d5a2172
child 682 553b4d4e3719
equal deleted inserted replaced
680:eecc4d5a2172 681:7b7736bea3ca
   457 complexity for parsing? It turns out that this is $O(n^3)$ for context-free
   457 complexity for parsing? It turns out that this is $O(n^3)$ for context-free
   458 languages. 
   458 languages. 
   459 
   459 
   460 To answer the question about complexity, let me describe next the CYK
   460 To answer the question about complexity, let me describe next the CYK
   461 algorithm (named after the authors Cocke–Younger–Kasami). This algorithm
   461 algorithm (named after the authors Cocke–Younger–Kasami). This algorithm
   462 works with grammars that are in Chomsky normalform. 
   462 works with grammars that are in \emph{Chomsky normalform}. In Chomsky
       
   463 normalform all rules must be of the form $\meta{A} ::= a$, where $a$ is 
       
   464 a terminal, or $\meta{A} ::= \meta{B}\cdot \meta{C}$, where $\meta{B}$ and
       
   465 $\meta{B}$ need to be non-terminals. And no rule can contain $\epsilon$.
       
   466 The following grammar is in Chomsky normalform:
       
   467 
       
   468 \begin{plstx}[margin=1cm]
       
   469   : \meta{S\/} ::= \meta{N}\cdot \meta{P}\\
       
   470   : \meta{P\/} ::= \meta{V}\cdot \meta{N}\\
       
   471   : \meta{N\/} ::= \meta{N}\cdot \meta{N}\\
       
   472   : \meta{N\/} ::= \meta{A}\cdot \meta{N}\\
       
   473   : \meta{N\/} ::= \texttt{student} | \texttt{trainer} | \texttt{team} 
       
   474                    | \texttt{trains}\\
       
   475   : \meta{V\/} ::= \texttt{trains} | \texttt{team}\\
       
   476   : \meta{A\/} ::= \texttt{The} | \texttt{the}\\
       
   477 \end{plstx}
       
   478 
       
   479 \noindent
       
   480 where $\meta{S}$ is the start symbol and $\meta{S}$, $\meta{P}$,
       
   481 $\meta{N}$, $\meta{V}$ and $\meta{A}$ are non-terminals. The ``words''
       
   482 are terminals. The rough idea behind this grammar is that $\meta{S}$
       
   483 stands for a sentence, $\meta{P}$ is a predicate, $\meta{N}$ is a noun
       
   484 and so on. For example the rule \mbox{$\meta{P} ::= \meta{V}\cdot
       
   485 \meta{N}$} states that a predicate can be a verb followed by a noun.
       
   486 Now the question is whether the string 
       
   487 
       
   488 \begin{center}
       
   489   \texttt{The trainer trains the student team}
       
   490 \end{center}
       
   491 
       
   492 \noindent
       
   493 is recognised by the grammar. The CYK algorithm starts with the
       
   494 following triangular data structure.
   463 
   495 
   464 TBD
   496 TBD
   465 
   497 
   466 \end{document}
   498 \end{document}
   467 
   499