# HG changeset patch # User Christian Urban # Date 1477145891 -3600 # Node ID 780486571e38179eccd174a10f39b5d94bb30008 # Parent 896a5f91838d2366f62542110de83ae3c1f8a74c updated diff -r 896a5f91838d -r 780486571e38 handouts/ho03.pdf Binary file handouts/ho03.pdf has changed diff -r 896a5f91838d -r 780486571e38 handouts/ho03.tex --- a/handouts/ho03.tex Sat Oct 22 13:11:33 2016 +0100 +++ b/handouts/ho03.tex Sat Oct 22 15:18:11 2016 +0100 @@ -17,7 +17,7 @@ widely known rather recently. Still let us in this lecture have a closer look at automata and their relation to regular expressions. This will help us with understanding why the -regular expression matchers in Python and Ruby are so slow +regular expression matchers in Python, Ruby and Java are so slow with certain regular expressions. The central definition is:\medskip @@ -177,7 +177,8 @@ have to ``consume'' any part of the input string, but ``silently'' change to a different state. In the left picture, for example, if you are in the starting state, you can -silently move either to $q_1$ or $q_2$. +silently move either to $q_1$ or $q_2$. This silent +transition is also often called \emph{$\epsilon$-transition}. \subsubsection*{Thompson Construction} @@ -558,7 +559,7 @@ constructed. First have a look at the second equation: the left-hand side is $q_1$ and the right-hand side $q_0\,a$. The right-hand side is essentially all possible ways how to end up -in $q_1$. There is only one incoming edge from $q_0$ consuming +in node $q_1$. There is only one incoming edge from $q_0$ consuming an $a$. Therefore the right hand side is this state followed by character---in this case $q_0\,a$. Now lets have a look at the third equation: there are two incoming diff -r 896a5f91838d -r 780486571e38 handouts/ho05.pdf Binary file handouts/ho05.pdf has changed diff -r 896a5f91838d -r 780486571e38 handouts/ho05.tex --- a/handouts/ho05.tex Sat Oct 22 13:11:33 2016 +0100 +++ b/handouts/ho05.tex Sat Oct 22 15:18:11 2016 +0100 @@ -1,7 +1,7 @@ \documentclass{article} \usepackage{../style} \usepackage{../langs} - +\usepackage{../grammar} \begin{document} @@ -48,25 +48,27 @@ Context-free languages are usually specified by grammars. For example a grammar for well-parenthesised expressions is -\begin{center} -$P \;\;\rightarrow\;\; ( \cdot P \cdot ) \cdot P \;|\; \epsilon$ -\end{center} - +\begin{plstx}[margin=3cm] +: \meta{P} ::= ( \cdot \meta{P} \cdot ) \cdot \meta{P} + | \epsilon\\ +\end{plstx} + \noindent or a grammar for recognising strings consisting of ones is -\begin{center} -$O \;\;\rightarrow\;\; 1 \cdot O \;|\; 1$ -\end{center} +\begin{plstx}[margin=3cm] +: \meta{O} ::= 1 \cdot \meta{O} + | 1\\ +\end{plstx} In general grammars consist of finitely many rules built up from \emph{terminal symbols} (usually lower-case letters) and -\emph{non-terminal symbols} (upper-case letters). Rules have +\emph{non-terminal symbols} (upper-case letters inside \meta{\mbox{}}). Rules have the shape -\begin{center} -$NT \;\;\rightarrow\;\; \textit{rhs}$ -\end{center} +\begin{plstx}[margin=3cm] +: \meta{NT} ::= rhs\\ +\end{plstx} \noindent where on the left-hand side is a single non-terminal and on the right a string consisting of both terminals and @@ -77,29 +79,29 @@ convention to use $|$ as a shorthand notation for several rules. For example -\begin{center} -$NT \;\;\rightarrow\;\; \textit{rhs}_1 \;|\; \textit{rhs}_2$ -\end{center} +\begin{plstx}[margin=3cm] +: \meta{NT} ::= rhs_1 + | rhs_2\\ +\end{plstx} -\noindent means that the non-terminal $NT$ can be replaced by +\noindent means that the non-terminal \meta{NT} can be replaced by either $\textit{rhs}_1$ or $\textit{rhs}_2$. If there are more than one non-terminal on the left-hand side of the rules, then we need to indicate what is the \emph{starting} symbol of the grammar. For example the grammar for arithmetic expressions can be given as follows -\begin{center} -\begin{tabular}{lcl@{\hspace{2cm}}l} -$E$ & $\rightarrow$ & $N$ & (1)\\ -$E$ & $\rightarrow$ & $E \cdot + \cdot E$ & (2)\\ -$E$ & $\rightarrow$ & $E \cdot - \cdot E$ & (3)\\ -$E$ & $\rightarrow$ & $E \cdot * \cdot E$ & (4)\\ -$E$ & $\rightarrow$ & $( \cdot E \cdot )$ & (5)\\ -$N$ & $\rightarrow$ & $N \cdot N \;|\; 0 \;|\; 1 \;|\: \ldots \;|\; 9$ & (6\ldots) -\end{tabular} -\end{center} +\begin{plstx}[margin=3cm,one per line] +\mbox{\rm (1)}: \meta{E} ::= \meta{N}\\ +\mbox{\rm (2)}: \meta{E} ::= \meta{E} \cdot + \cdot \meta{E}\\ +\mbox{\rm (3)}: \meta{E} ::= \meta{E} \cdot - \cdot \meta{E}\\ +\mbox{\rm (4)}: \meta{E} ::= \meta{E} \cdot * \cdot \meta{E}\\ +\mbox{\rm (5)}: \meta{E} ::= ( \cdot \meta{E} \cdot )\\ +\mbox{\rm (6\ldots)}: \meta{N} ::= \meta{N} \cdot \meta{N} + \mid 0 \mid 1 \mid \ldots \mid 9\\ +\end{plstx} -\noindent where $E$ is the starting symbol. A +\noindent where \meta{E} is the starting symbol. A \emph{derivation} for a grammar starts with the starting symbol of the grammar and in each step replaces one non-terminal by a right-hand side of a rule. A derivation ends @@ -109,12 +111,12 @@ \begin{center} \begin{tabular}{lll@{\hspace{2cm}}l} -$E$ & $\rightarrow$ & $E+E$ & by (2)\\ - & $\rightarrow$ & $(E)+E$ & by (5)\\ - & $\rightarrow$ & $(E+E)+E$ & by (2)\\ - & $\rightarrow$ & $(E+E)+N$ & by (1)\\ - & $\rightarrow$ & $(E+E)+3$ & by (6\dots)\\ - & $\rightarrow$ & $(N+E)+3$ & by (1)\\ +\meta{E} & $\rightarrow$ & $\meta{E}+\meta{E}$ & by (2)\\ + & $\rightarrow$ & $(\meta{E})+\meta{E}$ & by (5)\\ + & $\rightarrow$ & $(\meta{E}+\meta{E})+\meta{E}$ & by (2)\\ + & $\rightarrow$ & $(\meta{E}+\meta{E})+\meta{N}$ & by (1)\\ + & $\rightarrow$ & $(\meta{E}+\meta{E})+3$ & by (6\dots)\\ + & $\rightarrow$ & $(\meta{N}+\meta{E})+3$ & by (1)\\ & $\rightarrow^+$ & $(1+2)+3$ & by (1, 6\ldots)\\ \end{tabular} \end{center} @@ -239,6 +241,9 @@ undecidable. But in simple instance (the ones we deal in this module) one can usually see when a grammar is ambiguous. + +\subsection*{Parser Combinators} + Let us now turn to the problem of generating a parse-tree for a grammar and string. In what follows we explain \emph{parser combinators}, because they are easy to implement and closely diff -r 896a5f91838d -r 780486571e38 hws/hw06.pdf Binary file hws/hw06.pdf has changed diff -r 896a5f91838d -r 780486571e38 hws/hw06.tex --- a/hws/hw06.tex Sat Oct 22 13:11:33 2016 +0100 +++ b/hws/hw06.tex Sat Oct 22 15:18:11 2016 +0100 @@ -57,7 +57,15 @@ and (ii) give a sample string involving all rules given in 1.-4.~that can be parsed by this grammar. +\item Given the regular expression +\[(ab + a)\cdot (\ONE + b)\] + +there are two values for how this regular expression can recognise +the string $ab$. Give both values and indicate which one is the +POSIX value. + +\item \POSTSCRIPT \end{enumerate} \end{document} diff -r 896a5f91838d -r 780486571e38 hws/hw09.pdf Binary file hws/hw09.pdf has changed diff -r 896a5f91838d -r 780486571e38 hws/hw09.tex --- a/hws/hw09.tex Sat Oct 22 13:11:33 2016 +0100 +++ b/hws/hw09.tex Sat Oct 22 15:18:11 2016 +0100 @@ -1,6 +1,7 @@ \documentclass{article} \usepackage{../style} \usepackage{../graphics} +\usepackage{../langs} \begin{document} @@ -10,8 +11,36 @@ \begin{enumerate} \item Describe what is meant by \emph{eliminating tail - recursion}, when such an optimization can be applied and - why it is a benefit? + recursion}? When can this optimization be applied and + why is it of benefit? + +\item A programming language has arithmetic expression. For an + arithmetic expression the compiler of this language produces the + following snippet of JVM code. + +\begin{lstlisting}[language=JVMIS,numbers=none] +ldc 1 +ldc 2 +ldc 3 +imul +ldc 4 +ldc 3 +isub +iadd +iadd +\end{lstlisting} + + Give the arithmetic expression that produced this code. Make sure + you give all necessary parentheses. + +\item Describe what the following two JVM instructions do! + +\begin{lstlisting}[language=JVMIS,numbers=none] +iload 3 +istore 1 +\end{lstlisting} + +\item \POSTSCRIPT % \item It is true (I confirmed it) that % diff -r 896a5f91838d -r 780486571e38 slides/slides07.pdf Binary file slides/slides07.pdf has changed diff -r 896a5f91838d -r 780486571e38 slides/slides08.pdf Binary file slides/slides08.pdf has changed diff -r 896a5f91838d -r 780486571e38 slides/slides08.tex --- a/slides/slides08.tex Sat Oct 22 13:11:33 2016 +0100 +++ b/slides/slides08.tex Sat Oct 22 15:18:11 2016 +0100 @@ -5,7 +5,7 @@ \usepackage{../graphics} % beamer stuff -\renewcommand{\slidecaption}{AFL 07, King's College London} +\renewcommand{\slidecaption}{CFL 07, King's College London} \newcommand{\bl}[1]{\textcolor{blue}{#1}} \begin{document} @@ -15,7 +15,7 @@ \frametitle{% \begin{tabular}{@ {}c@ {}} \\[-3mm] - \LARGE Automata and \\[-2mm] + \LARGE Compilers and \\[-2mm] \LARGE Formal Languages (8)\\[3mm] \end{tabular}} diff -r 896a5f91838d -r 780486571e38 slides/slides09.pdf Binary file slides/slides09.pdf has changed diff -r 896a5f91838d -r 780486571e38 slides/slides09.tex --- a/slides/slides09.tex Sat Oct 22 13:11:33 2016 +0100 +++ b/slides/slides09.tex Sat Oct 22 15:18:11 2016 +0100 @@ -34,7 +34,7 @@ \lstset{morekeywords={def,if,then,else,write,read},keywordstyle=\color{codepurple}\bfseries} % beamer stuff -\renewcommand{\slidecaption}{AFL 09, King's College London} +\renewcommand{\slidecaption}{CFL 09, King's College London} \newcommand{\bl}[1]{\textcolor{blue}{#1}} @@ -45,7 +45,7 @@ \frametitle{% \begin{tabular}{@ {}c@ {}} \\[-3mm] - \LARGE Automata and \\[-2mm] + \LARGE Compilers and \\[-2mm] \LARGE Formal Languages (9)\\[3mm] \end{tabular}} diff -r 896a5f91838d -r 780486571e38 slides/slides10.pdf Binary file slides/slides10.pdf has changed diff -r 896a5f91838d -r 780486571e38 slides/slides10.tex --- a/slides/slides10.tex Sat Oct 22 13:11:33 2016 +0100 +++ b/slides/slides10.tex Sat Oct 22 15:18:11 2016 +0100 @@ -32,7 +32,7 @@ % beamer stuff -\renewcommand{\slidecaption}{AFL 10, King's College London} +\renewcommand{\slidecaption}{CFL 10, King's College London} \newcommand{\bl}[1]{\textcolor{blue}{#1}} @@ -43,7 +43,7 @@ \frametitle{% \begin{tabular}{@ {}c@ {}} \\[-3mm] - \LARGE Automata and \\[-2mm] + \LARGE Compilers and \\[-2mm] \LARGE Formal Languages (10)\\[3mm] \end{tabular}}