15 \section*{Handout 5 (Grammars \& Parser)} |
15 \section*{Handout 5 (Grammars \& Parser)} |
16 |
16 |
17 While regular expressions are very useful for lexing and for recognising |
17 While regular expressions are very useful for lexing and for recognising |
18 many patterns in strings (like email addresses), they have their |
18 many patterns in strings (like email addresses), they have their |
19 limitations. For example there is no regular expression that can |
19 limitations. For example there is no regular expression that can |
20 recognise the language $a^nb^n$ (where you have strings with $n$ $a$'s |
20 recognise the language $a^nb^n$ (where you have strings starting with $n$ $a$'s |
21 followed by the same amount of $b$'s). Another example for which there |
21 followed by the same amount of $b$'s). Another example for which there |
22 exists no regular expression is the language of well-parenthesised |
22 exists no regular expression is the language of well-parenthesised |
23 expressions. In languages like Lisp, which use parentheses rather |
23 expressions. In languages like Lisp, which use parentheses rather |
24 extensively, it might be of interest to know whether the following two |
24 extensively, it might be of interest to know whether the following two |
25 expressions are well-parenthesised or not (the left one is, the right |
25 expressions are well-parenthesised or not (the left one is, the right |
64 languages. Context-free in this setting means that ``words'' have one |
64 languages. Context-free in this setting means that ``words'' have one |
65 meaning only and this meaning is independent from the context |
65 meaning only and this meaning is independent from the context |
66 the ``words'' appear in. For example ambiguity issues like |
66 the ``words'' appear in. For example ambiguity issues like |
67 |
67 |
68 \begin{center} |
68 \begin{center} |
69 \tt Time flies like an arrow; fruit flies like bananas. |
69 \tt Time flies like an arrow. Fruit flies like bananas. |
70 \end{center} |
70 \end{center} |
71 |
71 |
72 \noindent |
72 \noindent |
73 from natural languages were the meaning of \emph{flies} depends on the |
73 from natural languages were the meaning of \emph{flies} depends on the |
74 surrounding \emph{context} are avoided as much as possible. |
74 surrounding \emph{context} are avoided as much as possible. |
464 a terminal, or $\meta{A} ::= \meta{B}\cdot \meta{C}$, where $\meta{B}$ and |
464 a terminal, or $\meta{A} ::= \meta{B}\cdot \meta{C}$, where $\meta{B}$ and |
465 $\meta{B}$ need to be non-terminals. And no rule can contain $\epsilon$. |
465 $\meta{B}$ need to be non-terminals. And no rule can contain $\epsilon$. |
466 The following grammar is in Chomsky normalform: |
466 The following grammar is in Chomsky normalform: |
467 |
467 |
468 \begin{plstx}[margin=1cm] |
468 \begin{plstx}[margin=1cm] |
469 : \meta{S\/} ::= \meta{N}\cdot \meta{P}\\ |
469 : \meta{S} ::= \meta{N}\cdot \meta{P}\\ |
470 : \meta{P\/} ::= \meta{V}\cdot \meta{N}\\ |
470 : \meta{P} ::= \meta{V}\cdot \meta{N}\\ |
471 : \meta{N\/} ::= \meta{N}\cdot \meta{N}\\ |
471 : \meta{N} ::= \meta{N}\cdot \meta{N}\\ |
472 : \meta{N\/} ::= \meta{A}\cdot \meta{N}\\ |
472 : \meta{N} ::= \meta{A}\cdot \meta{N}\\ |
473 : \meta{N\/} ::= \texttt{student} | \texttt{trainer} | \texttt{team} |
473 : \meta{N} ::= \texttt{student} | \texttt{trainer} | \texttt{team} |
474 | \texttt{trains}\\ |
474 | \texttt{trains}\\ |
475 : \meta{V\/} ::= \texttt{trains} | \texttt{team}\\ |
475 : \meta{V} ::= \texttt{trains} | \texttt{team}\\ |
476 : \meta{A\/} ::= \texttt{The} | \texttt{the}\\ |
476 : \meta{A} ::= \texttt{The} | \texttt{the}\\ |
477 \end{plstx} |
477 \end{plstx} |
478 |
478 |
479 \noindent |
479 \noindent |
480 where $\meta{S}$ is the start symbol and $\meta{S}$, $\meta{P}$, |
480 where $\meta{S}$ is the start symbol and $\meta{S}$, $\meta{P}$, |
481 $\meta{N}$, $\meta{V}$ and $\meta{A}$ are non-terminals. The ``words'' |
481 $\meta{N}$, $\meta{V}$ and $\meta{A}$ are non-terminals. The ``words'' |
491 |
491 |
492 \noindent |
492 \noindent |
493 is recognised by the grammar. The CYK algorithm starts with the |
493 is recognised by the grammar. The CYK algorithm starts with the |
494 following triangular data structure. |
494 following triangular data structure. |
495 |
495 |
496 TBD |
496 \begin{figure}[t] |
|
497 \begin{center} |
|
498 \begin{tikzpicture}[scale=0.8,line width=0.8mm] |
|
499 \draw (-2,0) -- (4,0); |
|
500 \draw (-2,1) -- (4,1); |
|
501 \draw (-2,2) -- (3,2); |
|
502 \draw (-2,3) -- (2,3); |
|
503 \draw (-2,4) -- (1,4); |
|
504 \draw (-2,5) -- (0,5); |
|
505 \draw (-2,6) -- (-1,6); |
|
506 |
|
507 \draw (0,0) -- (0, 5); |
|
508 \draw (1,0) -- (1, 4); |
|
509 \draw (2,0) -- (2, 3); |
|
510 \draw (3,0) -- (3, 2); |
|
511 \draw (4,0) -- (4, 1); |
|
512 \draw (-1,0) -- (-1, 6); |
|
513 \draw (-2,0) -- (-2, 6); |
|
514 |
|
515 \draw (-1.5,-0.5) node {\footnotesize{}\texttt{The}}; |
|
516 \draw (-0.5,-1.0) node {\footnotesize{}\texttt{trainer}}; |
|
517 \draw ( 0.5,-0.5) node {\footnotesize{}\texttt{trains}}; |
|
518 \draw ( 1.5,-1.0) node {\footnotesize{}\texttt{the}}; |
|
519 \draw ( 2.5,-0.5) node {\footnotesize{}\texttt{student}}; |
|
520 \draw ( 3.5,-1.0) node {\footnotesize{}\texttt{team}}; |
|
521 |
|
522 \draw (-1.5,0.5) node {$A$}; |
|
523 \draw (-0.5,0.5) node {$N$}; |
|
524 \draw ( 0.5,0.5) node {$N,V$}; |
|
525 \draw ( 1.5,0.5) node {$A$}; |
|
526 \draw ( 2.5,0.5) node {$N$}; |
|
527 \draw ( 3.5,0.5) node {$N,V$}; |
|
528 |
|
529 \draw (-2.4, 5.5) node {$1$}; |
|
530 \draw (-2.4, 4.5) node {$2$}; |
|
531 \draw (-2.4, 3.5) node {$3$}; |
|
532 \draw (-2.4, 2.5) node {$4$}; |
|
533 \draw (-2.4, 1.5) node {$5$}; |
|
534 \draw (-2.4, 0.5) node {$6$}; |
|
535 \end{tikzpicture} |
|
536 \end{center} |
|
537 \end{figure} |
497 |
538 |
498 \end{document} |
539 \end{document} |
499 |
540 |
500 |
541 |
501 %%% Parser combinators are now part of handout 6 |
542 %%% Parser combinators are now part of handout 6 |