376 : \meta{T} ::= \meta{F} | \meta{F} \cdot * \cdot \meta{T}\\ |
376 : \meta{T} ::= \meta{F} | \meta{F} \cdot * \cdot \meta{T}\\ |
377 : \meta{F} ::= num\_token | ( \cdot \meta{E} \cdot )\\ |
377 : \meta{F} ::= num\_token | ( \cdot \meta{E} \cdot )\\ |
378 \end{plstx} |
378 \end{plstx} |
379 |
379 |
380 \noindent |
380 \noindent |
381 In this grammar all $\meta{E}$xpressions, $\meta{T}$erms and $\meta{F}$actors |
381 In this grammar all $\meta{E}$xpressions, $\meta{T}$erms and |
382 are in some way protected from being left-recusive. For example if you |
382 $\meta{F}$actors are in some way protected from being |
383 start $\meta{E}$ you can derive another one by going through $\meta{T}$, then |
383 left-recusive. For example if you start $\meta{E}$ you can derive |
384 $\meta{F}$, but then $\meta{E}$ is protected by the open-parenthesis. |
384 another one by going through $\meta{T}$, then $\meta{F}$, but then |
|
385 $\meta{E}$ is protected by the open-parenthesis in the last rule. |
385 |
386 |
386 \subsection*{Removing $\epsilon$-Rules and CYK-Algorithm} |
387 \subsection*{Removing $\epsilon$-Rules and CYK-Algorithm} |
387 |
388 |
388 I showed above that the non-left-recursive grammar for binary numbers is |
389 I showed above that the non-left-recursive grammar for binary numbers is |
389 |
390 |
393 \end{plstx} |
394 \end{plstx} |
394 |
395 |
395 \noindent |
396 \noindent |
396 The transformation made the original grammar non-left-recursive, but at |
397 The transformation made the original grammar non-left-recursive, but at |
397 the expense of introducing an $\epsilon$ in the second rule. Having an |
398 the expense of introducing an $\epsilon$ in the second rule. Having an |
398 explicit $\epsilon$-rule is annoying to, not in terms of looping, but in |
399 explicit $\epsilon$-rule is annoying, not in terms of looping, but in |
399 terms of efficiency. The reason is that the $\epsilon$-rule always |
400 terms of efficiency. The reason is that the $\epsilon$-rule always |
400 applies but since it recognises the empty string, it does not make any |
401 applies but since it recognises the empty string, it does not make any |
401 progress with recognising a string. Better are rules like $( \cdot |
402 progress with recognising a string. Better are rules like $( \cdot |
402 \meta{E} \cdot )$ where something of the input is consumed. Getting |
403 \meta{E} \cdot )$ where something of the input is consumed. Getting |
403 rid of $\epsilon$-rules is also important for the CYK parsing algorithm, |
404 rid of $\epsilon$-rules is also important for the CYK parsing algorithm, |
453 \end{plstx} |
454 \end{plstx} |
454 |
455 |
455 \noindent |
456 \noindent |
456 I let you think about whether this grammar can still recognise all |
457 I let you think about whether this grammar can still recognise all |
457 binary numbers and whether this grammar is non-left-recursive. The |
458 binary numbers and whether this grammar is non-left-recursive. The |
458 precise statement for the transformation of removing $\epsilon$-rules is |
459 precise statement for the transformation of removing $\epsilon$-rules |
459 that if the original grammar was able to recognise only non-empty |
460 is that if the original grammar was able to recognise only non-empty |
460 strings, then the transformed grammar will be equivalent (matching the |
461 strings, then the transformed grammar will be equivalent (matching the |
461 same set of strings); if the original grammar was able to match the |
462 same set of strings); if the original grammar was able to match the |
462 empty string, then the transformed grammar will be able to match the |
463 empty string, then the transformed grammar will be able to match the |
463 same strings, \emph{except} the empty string. So the $\epsilon$-removal |
464 same strings, \emph{except} the empty string. So the |
464 does not preserve equivalence of grammars, but the small defect with the |
465 $\epsilon$-removal does not preserve equivalence of grammars in |
465 empty string is not important for practical purposes. |
466 general, but the small defect with the empty string is not important |
|
467 for practical purposes. |
466 |
468 |
467 So why are these transformations all useful? Well apart from making the |
469 So why are these transformations all useful? Well apart from making the |
468 parser combinators work (remember they cannot deal with left-recursion and |
470 parser combinators work (remember they cannot deal with left-recursion and |
469 are inefficient with $\epsilon$-rules), a second reason is that they help |
471 are inefficient with $\epsilon$-rules), a second reason is that they help |
470 with getting any insight into the complexity of the parsing problem. |
472 with getting any insight into the complexity of the parsing problem. |
561 |
563 |
562 |
564 |
563 \noindent |
565 \noindent |
564 The last row contains the information about all words and their |
566 The last row contains the information about all words and their |
565 corresponding non-terminals. For example the field for \texttt{trains} |
567 corresponding non-terminals. For example the field for \texttt{trains} |
566 contains the information $\meta{N}$ and $\meta{V}$ because it can be a |
568 contains the information $\meta{N}$ and $\meta{V}$ because \texttt{trains} can be a |
567 ``verb'' and a ``noun'' according to the grammar. The row above, |
569 ``verb'' and a ``noun'' according to the grammar. The row above, |
568 let's call the corresponding fields 5a to 5e, contains information |
570 let's call the corresponding fields 5a to 5e, contains information |
569 about 2-word parts of the sentence, namely |
571 about \underline{2-word} parts of the sentence, namely |
570 |
572 |
571 \begin{center} |
573 \begin{center} |
572 \begin{tabular}{llll} |
574 \begin{tabular}{llll} |
573 5a) & $\underbrace{\texttt{The}}_{A}$ $\mid$ $\underbrace{\texttt{trainer}}_{N}$ \\ |
575 5a) & $\underbrace{\texttt{The}}_{A}$ $\mid$ $\underbrace{\texttt{trainer}}_{N}$ \\ |
574 5b) & $\underbrace{\texttt{trainer}}_{N}$ $\mid$ $\underbrace{\texttt{trains}}_{N,V}$\\ |
576 5b) & $\underbrace{\texttt{trainer}}_{N}$ $\mid$ $\underbrace{\texttt{trains}}_{N,V}$\\ |