\slidecaption}{AFL 06, King's College London, 30.~October 2013}
  \begin{tabular}{@ {}c@ {}}
  \LARGE Automata and \\[-2mm] 
  \LARGE Formal Languages (6)\\[3mm] 

\frametitle{\begin{tabular}{c}Regular Languages\end{tabular}}

While regular expressions are very useful for lexing, 
there is no regular expression that can recognise the language \bl{$a^nb^n$}.\bigskip

\bl{$(((()()))())$} \;\;vs.\;\; \bl{$(((()()))()))$}



A (context-free) grammar \bl{$G$} consists of

\item a finite set of nonterminal symbols (upper case)
\item a finite terminal symbols or tokens (lower case)
\item a start symbol (which must be a nonterminal)
\item a set of rules
\bl{$A \rightarrow \text{rhs}$}

where \bl{rhs} are sequences involving terminals and nonterminals,
including the empty sequence \bl{$\epsilon$}.\medskip\pause

We also allow rules
\bl{$A \rightarrow \text{rhs}_1 | \text{rhs}_2 | \ldots$}



$S$ & $\rightarrow$ &  $\epsilon$ \\
$S$ & $\rightarrow$ &  $a\cdot S\cdot a$ \\
$S$ & $\rightarrow$ &  $b\cdot S\cdot b$ \\


$S$ & $\rightarrow$ &  $\epsilon \;|\; a\cdot S\cdot a \;|\;b\cdot S\cdot b$ \\


\frametitle{\begin{tabular}{c}Arithmetic Expressions\end{tabular}}

$E$ & $\rightarrow$ &  $num\_token$ \\
$E$ & $\rightarrow$ &  $E \cdot + \cdot E$ \\
$E$ & $\rightarrow$ &  $E \cdot - \cdot E$ \\
$E$ & $\rightarrow$ &  $E \cdot * \cdot E$ \\
$E$ & $\rightarrow$ &  $( \cdot E \cdot )$ 

\bl{\texttt{1 + 2 * 3 + 4}}


\frametitle{\begin{tabular}{c}A CFG Derivation\end{tabular}}

\item Begin with a string containing only the start symbol, say \bl{$S$}\bigskip
\item Replace any nonterminal \bl{$X$} in the string by the
right-hand side of some production \bl{$X \rightarrow \text{rhs}$}\bigskip
\item Repeat 2 until there are no nonterminals

\bl{$S \rightarrow \ldots \rightarrow \ldots  \rightarrow \ldots  \rightarrow \ldots $}


\frametitle{\begin{tabular}{c}Example Derivation\end{tabular}}

$S$ & $\rightarrow$ &  $\epsilon \;|\; a\cdot S\cdot a \;|\;b\cdot S\cdot b$ \\

\bl{$S$} & \bl{$\rightarrow$} & \bl{$aSa$}\\
              & \bl{$\rightarrow$} & \bl{$abSba$}\\
              & \bl{$\rightarrow$} & \bl{$abaSaba$}\\
              & \bl{$\rightarrow$} & \bl{$abaaba$}\\



\frametitle{\begin{tabular}{c}Example Derivation\end{tabular}}

$E$ & $\rightarrow$ &  $num\_token$ \\
$E$ & $\rightarrow$ &  $E \cdot + \cdot E$ \\
$E$ & $\rightarrow$ &  $E \cdot - \cdot E$ \\
$E$ & $\rightarrow$ &  $E \cdot * \cdot E$ \\
$E$ & $\rightarrow$ &  $( \cdot E \cdot )$ 

\bl{$E$} & \bl{$\rightarrow$} & \bl{$E*E$}\\
              & \bl{$\rightarrow$} & \bl{$E+E*E$}\\
              & \bl{$\rightarrow$} & \bl{$E+E*E+E$}\\
              & \bl{$\rightarrow^+$} & \bl{$1+2*3+4$}\\
\end{tabular} &\pause
\bl{$E$} & \bl{$\rightarrow$} & \bl{$E+E$}\\
              & \bl{$\rightarrow$} & \bl{$E+E+E$}\\
              & \bl{$\rightarrow$} & \bl{$E+E*E+E$}\\
              & \bl{$\rightarrow^+$} & \bl{$1+2*3+4$}\\


\frametitle{\begin{tabular}{c}Language of a CFG\end{tabular}}

Let \bl{$G$} be a context-free grammar with start symbol \bl{$S$}. 
Then the language \bl{$L(G)$} is:

\bl{$\{c_1\ldots c_n \;|\; \forall i.\; c_i \in T \wedge S \rightarrow^* c_1\ldots c_n \}$}

\item Terminals, because there are no rules for replacing them.
\item Once generated, terminals are ``permanent''.
\item Terminals ought to be tokens of the language\\
(but can also be strings).


\frametitle{\begin{tabular}{c}Parse Trees\end{tabular}}

$E$ & $\rightarrow$ &  $F \;|\; F \cdot * \cdot F$\\
$F$ & $\rightarrow$ & $T \;|\; T \cdot + \cdot T \;|\; T \cdot - \cdot T$\\
$T$ & $\rightarrow$ & $num\_token \;|\; ( \cdot E \cdot )$\\

\begin{tikzpicture}[level distance=8mm, blue]
  \node {$E$}
    child {node {$F$} 
     child {node {$T$} 
                 child {node {(\,$E$\,)}
                            child {node{$F$ *{} $F$}
                                  child {node {$T$} child {node {2}}}
                                  child {node {$T$} child {node {3}}} 
     child {node {+}}
     child {node {$T$}
       child {node {(\,$E$\,)} 
       child {node {$F$}
       child {node {$T$ +{} $T$}
                    child {node {3}}
                    child {node {4}} 

\begin{textblock}{5}(1, 6.5)


\frametitle{\begin{tabular}{c}Arithmetic Expressions\end{tabular}}

$E$ & $\rightarrow$ &  $num\_token$ \\
$E$ & $\rightarrow$ &  $E \cdot + \cdot E$ \\
$E$ & $\rightarrow$ &  $E \cdot - \cdot E$ \\
$E$ & $\rightarrow$ &  $E \cdot * \cdot E$ \\
$E$ & $\rightarrow$ &  $( \cdot E \cdot )$ 

A CFG is \alert{left-recursive} if it has a nonterminal \bl{$E$} such
that \bl{$E \rightarrow^+ E\cdot \ldots$}


\frametitle{\begin{tabular}{c}Ambiguous Grammars\end{tabular}}

A grammar is \alert{ambiguous} if there is a string that has at least two different parse trees.

$E$ & $\rightarrow$ &  $num\_token$ \\
$E$ & $\rightarrow$ &  $E \cdot + \cdot E$ \\
$E$ & $\rightarrow$ &  $E \cdot - \cdot E$ \\
$E$ & $\rightarrow$ &  $E \cdot * \cdot E$ \\
$E$ & $\rightarrow$ &  $( \cdot E \cdot )$ 

\bl{\texttt{1 + 2 * 3 + 4}}


\frametitle{\begin{tabular}{c}Dangling Else\end{tabular}}

Another ambiguous grammar:\bigskip

$E$ & $\rightarrow$ &  if $E$ then $E$\\
 & $|$ &  if $E$ then $E$ else $E$ \\
 & $|$ &  \ldots

\bl{\texttt{if a then if x then y else c}}


\frametitle{\begin{tabular}{c}Parser Combinators\end{tabular}}

Parser combinators: \bigskip

\mbox{}\hspace{-12mm}\mbox{}$\underbrace{\text{list of tokens}}_{\text{input}}$ \bl{$\Rightarrow$} 
$\underbrace{\text{set of (parsed input, unparsed input)}}_{\text{output}}$

\item sequencing
\item alternative
\item semantic action



Alternative parser (code \bl{$p\;||\;q$})\bigskip

\item apply \bl{$p$} and also \bl{$q$}; then combine the outputs

\large \bl{$p(\text{input}) \cup q(\text{input})$}



Sequence parser (code \bl{$p\sim q$})\bigskip

\item apply first \bl{$p$} producing a set of pairs
\item then apply \bl{$q$} to the unparsed parts
\item then combine the results:\\ \mbox{}\;\;((output$_1$, output$_2$), unparsed part)

\large \bl{$\{((o_1, o_2), u_2) \;|\;$}\\[2mm] 
\large\mbox{}\hspace{15mm} \bl{$(o_1, u_1) \in p(\text{input}) \wedge$}\\[2mm]
\large\mbox{}\hspace{15mm} \bl{$(o_2, u_2) \in q(u_1)\}$}



Function parser (code \bl{$p \Rightarrow f$})\bigskip

\item apply \bl{$p$} producing a set of pairs
\item then apply the function \bl{$f$} to each first component

\large \bl{$\{(f(o_1), u_1) \;|\; (o_1, u_1) \in p(\text{input})\}$}

\bl{$f$} is the semantic action (``what to do with the parsed input'')


\frametitle{\begin{tabular}{c}Semantic Actions\end{tabular}}


\bl{$T \sim + \sim E \Rightarrow \underbrace{f((x,y), z) \Rightarrow x + z}_{\text{semantic action}}$}


\bl{$F \sim * \sim T \Rightarrow f((x,y), z) \Rightarrow x * z$}


\bl{$\text{(} \sim E \sim \text{)} \Rightarrow f((x,y), z) \Rightarrow y$}


\frametitle{\begin{tabular}{c}Types of Parsers\end{tabular}}

\item {\bf Sequencing}: if \bl{$p$} returns results of type \bl{$T$}, and \bl{$q$} results of type \bl{$S$},
then \bl{$p \sim q$} returns results of type

\bl{$T \times S$}

\item {\bf Alternative}: if \bl{$p$} returns results of type \bl{$T$} then  \bl{$q$} \alert{must} also have results of type \bl{$T$},
and \bl{$p \;||\; q$} returns results of type


\item {\bf Semantic Action}: if \bl{$p$} returns results of type \bl{$T$} and \bl{$f$} is a function from
\bl{$T$} to \bl{$S$}, then
\bl{$p \Rightarrow f$} returns results of type




\frametitle{\begin{tabular}{c}Input Types of Parsers\end{tabular}}

\item input: \alert{string}
\item output: set of (output\_type, \alert{string})

actually it can be any input type as long as it is a kind of sequence
(for example a string)


\frametitle{\begin{tabular}{c}Scannerless Parsers\end{tabular}}

\item input: \alert{string}
\item output: set of (output\_type, \alert{string})

but lexers are better when whitespaces or comments need to be filtered out;
then input is a sequence of tokens


\frametitle{\begin{tabular}{c}Successful Parses\end{tabular}}

\item input: string
\item output: \alert{set of} (output\_type, string)

a parse is successful whenever the input has been
fully ``consumed'' (that is the second component is empty)


\frametitle{Abstract Parser Class}



\frametitle{\begin{tabular}{c}Two Grammars\end{tabular}}

Which languages are recognised by the following two grammars?

$S$ & $\rightarrow$ &  $1 \cdot S \cdot S$\\
        & $|$ & $\epsilon$

$U$ & $\rightarrow$ &  $1 \cdot U$\\
        & $|$ & $\epsilon$


\frametitle{\begin{tabular}{c}Ambiguous Grammars\end{tabular}}


\begin{tikzpicture}[y=.2cm, x=.009cm]
	\draw (0,0) -- coordinate (x axis mid) (1000,0);
    	\draw (0,0) -- coordinate (y axis mid) (0,30);
    	\foreach \x in {0, 20, 100, 200,...,1000}
     		\draw (\x,1pt) -- (\x,-3pt)
			node[anchor=north] {\small \x};
    	\foreach \y in {0,5,...,30}
     		\draw (1pt,\y) -- (-3pt,\y) 
     			node[anchor=east] {\small\y}; 
	\node[below=0.6cm] at (x axis mid) {\bl{1}s};
	\node[rotate=90, left=1.2cm] at (y axis mid) {secs};

	\draw[color=blue] plot[mark=*, mark options={fill=white}] 
		file {};
         \only<2->{\draw[color=red] plot[mark=triangle*, mark options={fill=white} ] 
                  file {};}
	\draw[color=blue] (0,0) -- 
		plot[mark=*, mark options={fill=white}] (0.25,0) -- (0.5,0) 
		node[right]{\small unambiguous};
	\only<2->{\draw[yshift=\baselineskip, color=red] (0,0) -- 
                plot[mark=triangle*, mark options={fill=white}] (0.25,0) -- (0.5,0)
                node[right]{\small ambiguous};}  



$Stmt$ & $\rightarrow$ &  $\text{skip}$\\
              & $|$ & $Id := AExp$\\
              & $|$ & $\text{if}\; B\!Exp \;\text{then}\; Block \;\text{else}\; Block$\\
              & $|$ & $\text{while}\; B\!Exp \;\text{do}\; Block$\medskip\\
$Stmts$ & $\rightarrow$ &  $Stmt \;\text{;}\; Stmts$\\
              & $|$ & $Stmt$\medskip\\
$Block$ & $\rightarrow$ &  $\{ Stmts \}$\\
                & $|$ & $Stmt$\medskip\\
$AExp$ & $\rightarrow$ & \ldots\\
$BExp$ & $\rightarrow$ & \ldots\\


\frametitle{\begin{tabular}{c}An Interpreter\end{tabular}}

\;\;$x := 5 \text{;}$\\
\;\;$y := x * 3\text{;}$\\
\;\;$y := x * 4\text{;}$\\
\;\;$x := u * 3$\\

\item the interpreter has to record the value of \bl{$x$} before assigning a value to \bl{$y$}\pause
\item \bl{\text{eval}(stmt, env)}




