handouts/ho05.tex
changeset 459 780486571e38
parent 385 7f8516ff408d
child 545 76a98ed71a2a
equal deleted inserted replaced
458:896a5f91838d 459:780486571e38
     1 \documentclass{article}
     1 \documentclass{article}
     2 \usepackage{../style}
     2 \usepackage{../style}
     3 \usepackage{../langs}
     3 \usepackage{../langs}
     4 
     4 \usepackage{../grammar}
     5 
     5 
     6 \begin{document}
     6 \begin{document}
     7 
     7 
     8 \section*{Handout 5 (Grammars \& Parser)}
     8 \section*{Handout 5 (Grammars \& Parser)}
     9 
     9 
    46 \noindent Context-free languages play an important role in
    46 \noindent Context-free languages play an important role in
    47 `day-to-day' text processing and in programming languages.
    47 `day-to-day' text processing and in programming languages.
    48 Context-free languages are usually specified by grammars. For
    48 Context-free languages are usually specified by grammars. For
    49 example a grammar for well-parenthesised expressions is
    49 example a grammar for well-parenthesised expressions is
    50 
    50 
    51 \begin{center}
    51 \begin{plstx}[margin=3cm]
    52 $P \;\;\rightarrow\;\; ( \cdot  P \cdot ) \cdot P \;|\; \epsilon$
    52 : \meta{P} ::=  ( \cdot  \meta{P} \cdot ) \cdot \meta{P}
    53 \end{center}
    53              | \epsilon\\ 
    54  
    54 \end{plstx}
       
    55 
    55 \noindent 
    56 \noindent 
    56 or a grammar for recognising strings consisting of ones is
    57 or a grammar for recognising strings consisting of ones is
    57 
    58 
    58 \begin{center}
    59 \begin{plstx}[margin=3cm]
    59 $O \;\;\rightarrow\;\; 1 \cdot  O \;|\; 1$
    60 : \meta{O} ::= 1 \cdot  \meta{O} 
    60 \end{center}
    61              | 1\\
       
    62 \end{plstx}
    61 
    63 
    62 In general grammars consist of finitely many rules built up
    64 In general grammars consist of finitely many rules built up
    63 from \emph{terminal symbols} (usually lower-case letters) and
    65 from \emph{terminal symbols} (usually lower-case letters) and
    64 \emph{non-terminal symbols} (upper-case letters). Rules have
    66 \emph{non-terminal symbols} (upper-case letters inside \meta{\mbox{}}). Rules have
    65 the shape
    67 the shape
    66 
    68 
    67 \begin{center}
    69 \begin{plstx}[margin=3cm]
    68 $NT \;\;\rightarrow\;\; \textit{rhs}$
    70 : \meta{NT} ::= rhs\\
    69 \end{center}
    71 \end{plstx}
    70  
    72  
    71 \noindent where on the left-hand side is a single non-terminal
    73 \noindent where on the left-hand side is a single non-terminal
    72 and on the right a string consisting of both terminals and
    74 and on the right a string consisting of both terminals and
    73 non-terminals including the $\epsilon$-symbol for indicating
    75 non-terminals including the $\epsilon$-symbol for indicating
    74 the empty string. We use the convention to separate components
    76 the empty string. We use the convention to separate components
    75 on the right hand-side by using the $\cdot$ symbol, as in the
    77 on the right hand-side by using the $\cdot$ symbol, as in the
    76 grammar for well-parenthesised expressions. We also use the
    78 grammar for well-parenthesised expressions. We also use the
    77 convention to use $|$ as a shorthand notation for several
    79 convention to use $|$ as a shorthand notation for several
    78 rules. For example
    80 rules. For example
    79 
    81 
    80 \begin{center}
    82 \begin{plstx}[margin=3cm]
    81 $NT \;\;\rightarrow\;\; \textit{rhs}_1 \;|\; \textit{rhs}_2$
    83 : \meta{NT} ::= rhs_1
    82 \end{center}
    84    | rhs_2\\
    83 
    85 \end{plstx}
    84 \noindent means that the non-terminal $NT$ can be replaced by
    86 
       
    87 \noindent means that the non-terminal \meta{NT} can be replaced by
    85 either $\textit{rhs}_1$ or $\textit{rhs}_2$. If there are more
    88 either $\textit{rhs}_1$ or $\textit{rhs}_2$. If there are more
    86 than one non-terminal on the left-hand side of the rules, then
    89 than one non-terminal on the left-hand side of the rules, then
    87 we need to indicate what is the \emph{starting} symbol of the
    90 we need to indicate what is the \emph{starting} symbol of the
    88 grammar. For example the grammar for arithmetic expressions
    91 grammar. For example the grammar for arithmetic expressions
    89 can be given as follows
    92 can be given as follows
    90 
    93 
    91 \begin{center}
    94 \begin{plstx}[margin=3cm,one per line]
    92 \begin{tabular}{lcl@{\hspace{2cm}}l}
    95 \mbox{\rm (1)}: \meta{E} ::= \meta{N}\\
    93 $E$ & $\rightarrow$ &  $N$                 & (1)\\
    96 \mbox{\rm (2)}: \meta{E} ::= \meta{E} \cdot + \cdot \meta{E}\\
    94 $E$ & $\rightarrow$ &  $E \cdot + \cdot E$ & (2)\\
    97 \mbox{\rm (3)}: \meta{E} ::= \meta{E} \cdot - \cdot \meta{E}\\
    95 $E$ & $\rightarrow$ &  $E \cdot - \cdot E$ & (3)\\
    98 \mbox{\rm (4)}: \meta{E} ::= \meta{E} \cdot * \cdot \meta{E}\\
    96 $E$ & $\rightarrow$ &  $E \cdot * \cdot E$ & (4)\\
    99 \mbox{\rm (5)}: \meta{E} ::= ( \cdot \meta{E} \cdot )\\
    97 $E$ & $\rightarrow$ &  $( \cdot E \cdot )$ & (5)\\
   100 \mbox{\rm (6\ldots)}: \meta{N} ::= \meta{N} \cdot \meta{N} 
    98 $N$ & $\rightarrow$ & $N \cdot N \;|\; 0 \;|\; 1 \;|\: \ldots \;|\; 9$ & (6\ldots)
   101   \mid 0 \mid 1 \mid \ldots \mid 9\\
    99 \end{tabular}
   102 \end{plstx}
   100 \end{center}
   103 
   101 
   104 \noindent where \meta{E} is the starting symbol. A
   102 \noindent where $E$ is the starting symbol. A
       
   103 \emph{derivation} for a grammar starts with the starting
   105 \emph{derivation} for a grammar starts with the starting
   104 symbol of the grammar and in each step replaces one
   106 symbol of the grammar and in each step replaces one
   105 non-terminal by a right-hand side of a rule. A derivation ends
   107 non-terminal by a right-hand side of a rule. A derivation ends
   106 with a string in which only terminal symbols are left. For
   108 with a string in which only terminal symbols are left. For
   107 example a derivation for the string $(1 + 2) + 3$ is as
   109 example a derivation for the string $(1 + 2) + 3$ is as
   108 follows:
   110 follows:
   109 
   111 
   110 \begin{center}
   112 \begin{center}
   111 \begin{tabular}{lll@{\hspace{2cm}}l}
   113 \begin{tabular}{lll@{\hspace{2cm}}l}
   112 $E$ & $\rightarrow$ & $E+E$          & by (2)\\
   114 \meta{E} & $\rightarrow$ & $\meta{E}+\meta{E}$          & by (2)\\
   113        & $\rightarrow$ & $(E)+E$     & by (5)\\
   115        & $\rightarrow$ & $(\meta{E})+\meta{E}$     & by (5)\\
   114        & $\rightarrow$ & $(E+E)+E$   & by (2)\\
   116        & $\rightarrow$ & $(\meta{E}+\meta{E})+\meta{E}$   & by (2)\\
   115        & $\rightarrow$ & $(E+E)+N$   & by (1)\\
   117        & $\rightarrow$ & $(\meta{E}+\meta{E})+\meta{N}$   & by (1)\\
   116        & $\rightarrow$ & $(E+E)+3$   & by (6\dots)\\
   118        & $\rightarrow$ & $(\meta{E}+\meta{E})+3$   & by (6\dots)\\
   117        & $\rightarrow$ & $(N+E)+3$   & by (1)\\
   119        & $\rightarrow$ & $(\meta{N}+\meta{E})+3$   & by (1)\\
   118        & $\rightarrow^+$ & $(1+2)+3$ & by (1, 6\ldots)\\
   120        & $\rightarrow^+$ & $(1+2)+3$ & by (1, 6\ldots)\\
   119 \end{tabular} 
   121 \end{tabular} 
   120 \end{center}
   122 \end{center}
   121 
   123 
   122 \noindent where on the right it is indicated which 
   124 \noindent where on the right it is indicated which 
   237 left-associative). Unfortunately already the problem of
   239 left-associative). Unfortunately already the problem of
   238 deciding whether a grammar is ambiguous or not is in general
   240 deciding whether a grammar is ambiguous or not is in general
   239 undecidable. But in simple instance (the ones we deal in this
   241 undecidable. But in simple instance (the ones we deal in this
   240 module) one can usually see when a grammar is ambiguous.
   242 module) one can usually see when a grammar is ambiguous.
   241 
   243 
       
   244 
       
   245 \subsection*{Parser Combinators}
       
   246 
   242 Let us now turn to the problem of generating a parse-tree for
   247 Let us now turn to the problem of generating a parse-tree for
   243 a grammar and string. In what follows we explain \emph{parser
   248 a grammar and string. In what follows we explain \emph{parser
   244 combinators}, because they are easy to implement and closely
   249 combinators}, because they are easy to implement and closely
   245 resemble grammar rules. Imagine that a grammar describes the
   250 resemble grammar rules. Imagine that a grammar describes the
   246 strings of natural numbers, such as the grammar $N$ shown
   251 strings of natural numbers, such as the grammar $N$ shown