# HG changeset patch # User Christian Urban # Date 1414499051 0 # Node ID 7ed2a25dd115d793889e7140ff3d27786f08f8f0 # Parent 201c2c6d86964fa4e3b6d31cd5f3492b7e47c03e updated diff -r 201c2c6d8696 -r 7ed2a25dd115 handouts/ho03.pdf Binary file handouts/ho03.pdf has changed diff -r 201c2c6d8696 -r 7ed2a25dd115 handouts/ho03.tex --- a/handouts/ho03.tex Tue Oct 28 06:01:00 2014 +0000 +++ b/handouts/ho03.tex Tue Oct 28 12:24:11 2014 +0000 @@ -113,7 +113,7 @@ the set of strings accepted by an automaton. -While with DFAs it will always clear that given a character +While with DFAs it will always be clear that given a character what the next state is (potentially none), it will be useful to relax this restriction. That means we have several potential successor states. We even allow ``silent @@ -374,10 +374,10 @@ \subsubsection*{Subset Construction} -What is interesting that for every NFA we can find a DFA which -recognises the same language. This can be done by the -\emph{subset construction}. Consider again the NFA on the -left, consisting of nodes labeled $0$, $1$ and $2$. +What is interesting is that for every NFA we can find a DFA +which recognises the same language. This can, for example, be +done by the \emph{subset construction}. Consider again the NFA +on the left, consisting of nodes labeled $0$, $1$ and $2$. \begin{center} \begin{tabular}{c@{\hspace{10mm}}c} @@ -440,27 +440,174 @@ can be calculated from the starting state of the NFA, that is $0$, and then do as many $\epsilon$-transitions as possible. This gives $\{0,1,2\}$ which is the starting state of the DFA. -One terminal states in the DFA are given by all sets that +The terminal states in the DFA are given by all sets that contain a $2$, which is the terminal state of the NFA. This completes the subset construction. -There are two points to note: One is that the resulting DFA -contains a number of ``dead'' nodes that are never reachable -from the starting state (that is that the calculated DFA in -this example is not a minimal DFA). Such dead nodes can be -safely removed without changing the language that is -recognised by the DFA. Another point is that in some cases the -subset construction produces a DFA that does \emph{not} -contain any dead nodes\ldots{}that means it calculates a -minimal DFA. Which in turn means that in some cases the number -of nodes by going from NFAs to DFAs exponentially increases, -namely by $2^n$ (which is the number of subsets you can form -for $n$ nodes). +There are two points to note: One is that very often the +resulting DFA contains a number of ``dead'' nodes that are +never reachable from the starting state (that is that the +calculated DFA in this example is not a minimal DFA). Such +dead nodes can be safely removed without changing the language +that is recognised by the DFA. Another point is that in some +cases, however, the subset construction produces a DFA that +does \emph{not} contain any dead nodes\ldots{}that means it +calculates a minimal DFA. Which in turn means that in some +cases the number of nodes by going from NFAs to DFAs +exponentially increases, namely by $2^n$ (which is the number +of subsets you can form for $n$ nodes). \subsubsection*{Brzozowski's Method} -DFA -> NFA +As said before, we can also go into the other direction---from +DFAs to regular expressions. Brzozowski's method calculates +a regular expression using familiar transformations for +solving equational systems. Consider the DFA: + +\begin{center} +\begin{tikzpicture}[scale=1.5,>=stealth',very thick,auto, + every state/.style={minimum size=0pt, + inner sep=2pt,draw=blue!50,very thick, + fill=blue!20}] + \node[state, initial] (q0) at ( 0,1) {$q_0$}; + \node[state] (q1) at ( 1,1) {$q_1$}; + \node[state, accepting] (q2) at ( 2,1) {$q_2$}; + \path[->] (q0) edge[bend left] node[above] {$a$} (q1) + (q1) edge[bend left] node[above] {$b$} (q0) + (q2) edge[bend left=50] node[below] {$b$} (q0) + (q1) edge node[above] {$a$} (q2) + (q2) edge [loop right] node {$a$} () + (q0) edge [loop below] node {$b$} (); +\end{tikzpicture} +\end{center} + +\noindent for which we can set up the following equational +system + +\begin{eqnarray} +q_0 & = & \epsilon + q_0\,b + q_1\,b + q_2\,b\\ +q_1 & = & q_0\,a\\ +q_2 & = & q_1\,a + q_2\,a +\end{eqnarray} + +\noindent There is an equation for each node in the DFA. Let +us have a look how the right-hand sides of the equations are +constructed. First have a look at the second equation: the +left-hand side is $q_1$ and the right-hand side $q_0\,a$. The +right-hand side is essentially all possible ways how to end up +in $q_1$. There is only one incoming edge from $q_0$ consuming +an $a$. Therefore we say: if we are in $q_0$ consuming an $a$ +then we end up in $q_1$. Therefore the right hand side is +state followed by character---in this case $q_0\,a$. Now lets +have a look at the third equation: there are two incoming +edges. Therefore we have two terms, namely $q_1\,a$ and +$q_2\,a$. These terms are separated by $+$. The first states +that if in state $q_1$ consuming an $a$ will bring you to +$q_2$, and the secont that being in $q_2$ and consuming an $a$ +will make you stay in $q_2$. The right-hand side of the +first equation is constructed similarly: there are three +incoming edges, therefore there are three terms. There is +one exception in that we also ``add'' $\epsilon$ to the +first equation, because it corresponds to the starting state +in the DFA. + +Having constructed the equational system, the question is +how to solve it? Remarkably the rules are very similar to +solving usual linear equational systems. For example the +second equation does not contain the variable $q_1$ on the +right-hand side of the equation. We can therefore eliminate +$q_1$ from the system by just substituting this equation +into the other two. This gives + +\begin{eqnarray} +q_0 & = & \epsilon + q_0\,b + q_0\,a\,b + q_2\,b\\ +q_2 & = & q_0\,a\,a + q_2\,a +\end{eqnarray} + +\noindent where in Equation (4) we have two occurences +of $q_0$. Like the laws about $+$ and $\cdot$, we can simplify +Equation (4) to obtain the following two equations: + +\begin{eqnarray} +q_0 & = & \epsilon + q_0\,(b + a\,b) + q_2\,b\\ +q_2 & = & q_0\,a\,a + q_2\,a +\end{eqnarray} + +\noindent Unfortunately we cannot make any more progress with +substituting equations, because both (6) and (7) contain the +variable on the left-hand side also on the right-hand side. +Here we need to now use a law that is different from the usual +laws. It is called \emph{Arden's rule}. It states that +if an equation is of the form $q = q\,r + s$ then it can be +transformed to $q = s\, r^*$. Since we can assume $+$ is +symmetric, equation (7) is of that form: $s$ is $q_0\,a\,a$ +and $r$ is $a$. That means we can transform Equation (7) +to obtain the two new equations + +\begin{eqnarray} +q_0 & = & \epsilon + q_0\,(b + a\,b) + q_2\,b\\ +q_2 & = & q_0\,a\,a\,(a^*) +\end{eqnarray} + +\noindent Now again we can substitute the second equation into +the first in order to eliminate the variable $q_2$. + +\begin{eqnarray} +q_0 & = & \epsilon + q_0\,(b + a\,b) + q_0\,a\,a\,(a^*)\,b +\end{eqnarray} + +\noindent Pulling $q_0$ out as a single factor gives: + +\begin{eqnarray} +q_0 & = & \epsilon + q_0\,(b + a\,b + a\,a\,(a^*)\,b) +\end{eqnarray} + +\noindent This equation is again of the form so that we can +apply Arden's rule ($r$ is $b + a\,b + a\,a\,(a^*)\,b$ and $s$ +is $\epsilon$). This gives as solution for $q_0$ the following +regular expression: + +\begin{eqnarray} +q_0 & = & \epsilon\,(b + a\,b + a\,a\,(a^*)\,b)^* +\end{eqnarray} + +\noindent SInce this is a regular expression, we can simplify +away the $\epsilon$ to obtain the slightly simpler regular +expression + +\begin{eqnarray} +q_0 & = & (b + a\,b + a\,a\,(a^*)\,b)^* +\end{eqnarray} + +\noindent +Now we can unwind this process and obtain the solutions +for the other equations. This gives: + +\begin{eqnarray} +q_0 & = & (b + a\,b + a\,a\,(a^*)\,b)^*\\ +q_1 & = & (b + a\,b + a\,a\,(a^*)\,b)^*\,a\\ +q_2 & = & (b + a\,b + a\,a\,(a^*)\,b)^*\,a\,a\,(a)^* +\end{eqnarray} + +\noindent Finally, we only need to ``add'' up the equations +which correspond to a terminal state. In our running example, +this is just $q_2$. Consequently, a regular expression +that recognises the same language as the automaton is + +\[ +(b + a\,b + a\,a\,(a^*)\,b)^*\,a\,a\,(a)^* +\] + +\noindent You can somewhat crosscheck your solution +by taking a string the regular expression can match and +and see whether it can be matched by the automaton. +One string for example is $aaa$ and \emph{voila} this +string is also matched by the automaton. + +We should prove that Brzozowski's method really produces +an equivalent regular expression for the automaton. But +for the purposes of this module, we omit this. \subsubsection*{Automata Minimization} @@ -694,7 +841,18 @@ the accepting and non-accepting states. You can then translate this DFA back into a regular expression. -Not all languages are regular\ldots{} +Not all languages are regular. The most well-known example +of a language that is not regular consists of all the strings +of the form + +\[a^n\,b^n\] + +\noindent meaning strings that have the same number of $a$s +and $b$s. You can try, but you cannot find a regular +expression for this language and also not an automaton. One +can actually prove that there is no regular expression nor +automaton for this language, but again that would lead us too +far afield for what we want to do in this module. \end{document} diff -r 201c2c6d8696 -r 7ed2a25dd115 handouts/ho05.pdf Binary file handouts/ho05.pdf has changed diff -r 201c2c6d8696 -r 7ed2a25dd115 handouts/ho05.tex --- a/handouts/ho05.tex Tue Oct 28 06:01:00 2014 +0000 +++ b/handouts/ho05.tex Tue Oct 28 12:24:11 2014 +0000 @@ -1,59 +1,25 @@ \documentclass{article} -\usepackage{hyperref} -\usepackage{amssymb} -\usepackage{amsmath} -\usepackage[T1]{fontenc} -\usepackage{tikz} -\usetikzlibrary{arrows} -\usetikzlibrary{automata} -\usetikzlibrary{shapes} -\usetikzlibrary{shadows} -\usetikzlibrary{positioning} -\usetikzlibrary{calc} -\usetikzlibrary{fit} -\usetikzlibrary{backgrounds} +\usepackage{../style} \usepackage{../langs} - -\newcommand{\dn}{\stackrel{\mbox{\scriptsize def}}{=}}% +\usepackage{../graphics} - -\newcommand\grid[1]{% -\begin{tikzpicture}[baseline=(char.base)] - \path[use as bounding box] - (0,0) rectangle (1em,1em); - \draw[red!50, fill=red!20] - (0,0) rectangle (1em,1em); - \node[inner sep=1pt,anchor=base west] - (char) at (0em,\gridraiseamount) {#1}; -\end{tikzpicture}} -\newcommand\gridraiseamount{0.12em} - -\makeatletter -\newcommand\Grid[1]{% - \@tfor\z:=#1\do{\grid{\z}}} -\makeatother - -\newcommand\Vspace[1][.3em]{% - \mbox{\kern.06em\vrule height.3ex}% - \vbox{\hrule width#1}% - \hbox{\vrule height.3ex}} - -\def\VS{\Vspace[0.6em]} - \begin{document} -\section*{Handout 5} +\section*{Handout 5 (Lexing)} -Whenever you want to design a new programming language or implement a compiler for an -existing language, the first task is to fix the basic ``words'' of the language. For example what are the -keywords, or reserved words, of the language, what are permitted identifiers, numbers, expressions and so -on. One convenient way to do this is, of -course, by using regular expressions. +Whenever you want to design a new programming language or +implement a compiler for an existing language, the first task +is to fix the basic ``words'' of the language. For example +what are the keywords, or reserved words, of the language, +what are permitted identifiers, numbers, expressions and so +on. One convenient way to do this is, of course, by using +regular expressions. -In this course we want to take a closer look at the -WHILE programming language. This is a simple imperative programming language consisting of arithmetic -expressions, assignments, if-statements and loops. For example the Fibonacci program can be -written in this language as follows: +In this course we want to take a closer look at the WHILE +programming language. This is a simple imperative programming +language consisting of arithmetic expressions, assignments, +if-statements and loops. For example the Fibonacci program can +be written in this language as follows: \begin{center} \mbox{\lstinputlisting[language=while]{../progs/fib.while}} diff -r 201c2c6d8696 -r 7ed2a25dd115 handouts/ho06.pdf Binary file handouts/ho06.pdf has changed diff -r 201c2c6d8696 -r 7ed2a25dd115 handouts/ho06.tex --- a/handouts/ho06.tex Tue Oct 28 06:01:00 2014 +0000 +++ b/handouts/ho06.tex Tue Oct 28 12:24:11 2014 +0000 @@ -42,7 +42,7 @@ \begin{document} -\section*{Handout 6} +\section*{Handout 6 (Parser Combinators)} While regular expressions are very useful for lexing and for recognising many patterns in strings (like email addresses), they have their limitations. For diff -r 201c2c6d8696 -r 7ed2a25dd115 hws/hw01.pdf Binary file hws/hw01.pdf has changed diff -r 201c2c6d8696 -r 7ed2a25dd115 hws/hw02.pdf Binary file hws/hw02.pdf has changed diff -r 201c2c6d8696 -r 7ed2a25dd115 hws/hw02.tex --- a/hws/hw02.tex Tue Oct 28 06:01:00 2014 +0000 +++ b/hws/hw02.tex Tue Oct 28 12:24:11 2014 +0000 @@ -50,7 +50,7 @@ Write down clearly in each case what you need to prove and what are the assumptions. -\item Define what is mean by the derivative of a regular expressions +\item Define what is meant by the derivative of a regular expressions with respoect to a character. (Hint: The derivative is defined recursively.) diff -r 201c2c6d8696 -r 7ed2a25dd115 hws/hw03.pdf Binary file hws/hw03.pdf has changed diff -r 201c2c6d8696 -r 7ed2a25dd115 hws/hw03.tex --- a/hws/hw03.tex Tue Oct 28 06:01:00 2014 +0000 +++ b/hws/hw03.tex Tue Oct 28 12:24:11 2014 +0000 @@ -59,7 +59,10 @@ a transition for every letter from the alphabet.) \begin{center} - \begin{tikzpicture}[scale=2, line width=0.7mm] + \begin{tikzpicture}[>=stealth',very thick,auto, + every state/.style={minimum size=0pt, + inner sep=2pt,draw=blue!50,very thick, + fill=blue!20},scale=2] \node[state, initial] (q0) at ( 0,1) {$q_0$}; \node[state, accepting] (q1) at ( 1,1) {$q_1$}; \path[->] (q0) edge node[above] {$a$} (q1) @@ -92,7 +95,10 @@ recognises the same language: \begin{center} - \begin{tikzpicture}[scale=3, line width=0.7mm] + \begin{tikzpicture}[>=stealth',very thick,auto, + every state/.style={minimum size=0pt, + inner sep=2pt,draw=blue!50,very thick, + fill=blue!20},scale=2] \node[state, initial] (q0) at ( 0,1) {$q_0$}; \node[state] (q1) at ( 1,1) {$q_1$}; \node[state, accepting] (q2) at ( 2,1) {$q_2$}; @@ -108,7 +114,10 @@ case states can be merged, state clearly which states can be merged. \begin{center} - \begin{tikzpicture}[scale=2, line width=0.7mm] + \begin{tikzpicture}[>=stealth',very thick,auto, + every state/.style={minimum size=0pt, + inner sep=2pt,draw=blue!50,very thick, + fill=blue!20},scale=2] \node[state, initial] (q0) at ( 0,1) {$q_0$}; \node[state] (q1) at ( 1,1) {$q_1$}; \node[state, accepting] (q4) at ( 2,1) {$q_4$}; @@ -129,7 +138,10 @@ \item Given the following finite deterministic automaton over the alphabet $\{a, b\}$: \begin{center} - \begin{tikzpicture}[scale=2, line width=0.5mm] + \begin{tikzpicture}[scale=2,>=stealth',very thick,auto, + every state/.style={minimum size=0pt, + inner sep=2pt,draw=blue!50,very thick, + fill=blue!20}] \node[state, initial, accepting] (q0) at ( 0,1) {$q_0$}; \node[state, accepting] (q1) at ( 1,1) {$q_1$}; \node[state] (q2) at ( 2,1) {$q_2$}; diff -r 201c2c6d8696 -r 7ed2a25dd115 hws/hw04.pdf Binary file hws/hw04.pdf has changed diff -r 201c2c6d8696 -r 7ed2a25dd115 hws/hw04.tex --- a/hws/hw04.tex Tue Oct 28 06:01:00 2014 +0000 +++ b/hws/hw04.tex Tue Oct 28 12:24:11 2014 +0000 @@ -1,8 +1,6 @@ \documentclass{article} \usepackage{../style} - -\usepackage{tikz} -\usetikzlibrary{automata} +\usepackage{../graphics} \begin{document} diff -r 201c2c6d8696 -r 7ed2a25dd115 hws/hw05.pdf Binary file hws/hw05.pdf has changed diff -r 201c2c6d8696 -r 7ed2a25dd115 hws/hw05.tex --- a/hws/hw05.tex Tue Oct 28 06:01:00 2014 +0000 +++ b/hws/hw05.tex Tue Oct 28 12:24:11 2014 +0000 @@ -1,12 +1,7 @@ \documentclass{article} -\usepackage{charter} -\usepackage{hyperref} -\usepackage{amssymb} -\usepackage{amsmath} -\usepackage{tikz} -\usetikzlibrary{automata} +\usepackage{../style} +\usepackage{../graphics} -\newcommand{\dn}{\stackrel{\mbox{\scriptsize def}}{=}}% for definitions \begin{document} diff -r 201c2c6d8696 -r 7ed2a25dd115 hws/hw06.pdf Binary file hws/hw06.pdf has changed diff -r 201c2c6d8696 -r 7ed2a25dd115 hws/hw06.tex --- a/hws/hw06.tex Tue Oct 28 06:01:00 2014 +0000 +++ b/hws/hw06.tex Tue Oct 28 12:24:11 2014 +0000 @@ -1,12 +1,6 @@ \documentclass{article} -\usepackage{charter} -\usepackage{hyperref} -\usepackage{amssymb} -\usepackage{amsmath} -\usepackage{tikz} -\usetikzlibrary{automata} - -\newcommand{\dn}{\stackrel{\mbox{\scriptsize def}}{=}}% for definitions +\usepackage{../style} +\usepackage{../graphics} \begin{document} diff -r 201c2c6d8696 -r 7ed2a25dd115 hws/hw07.pdf Binary file hws/hw07.pdf has changed diff -r 201c2c6d8696 -r 7ed2a25dd115 hws/hw07.tex --- a/hws/hw07.tex Tue Oct 28 06:01:00 2014 +0000 +++ b/hws/hw07.tex Tue Oct 28 12:24:11 2014 +0000 @@ -1,12 +1,6 @@ \documentclass{article} -\usepackage{charter} -\usepackage{hyperref} -\usepackage{amssymb} -\usepackage{amsmath} -\usepackage{tikz} -\usetikzlibrary{automata} - -\newcommand{\dn}{\stackrel{\mbox{\scriptsize def}}{=}}% for definitions +\usepackage{../style} +\usepackage{../graphics} \begin{document} diff -r 201c2c6d8696 -r 7ed2a25dd115 hws/hw08.pdf Binary file hws/hw08.pdf has changed diff -r 201c2c6d8696 -r 7ed2a25dd115 hws/hw08.tex --- a/hws/hw08.tex Tue Oct 28 06:01:00 2014 +0000 +++ b/hws/hw08.tex Tue Oct 28 12:24:11 2014 +0000 @@ -1,27 +1,26 @@ \documentclass{article} -\usepackage{charter} -\usepackage{hyperref} -\usepackage{amssymb} -\usepackage{amsmath} -\usepackage{tikz} -\usetikzlibrary{automata} - -\newcommand{\dn}{\stackrel{\mbox{\scriptsize def}}{=}}% for definitions +\usepackage{../style} +\usepackage{../graphics} \begin{document} \section*{Homework 8} \begin{enumerate} -\item Write a program in the WHILE-language that calculates the factorial function. +\item Write a program in the WHILE-language that calculates + the factorial function. -\item What optimisations could a compiler perform when compiling a WHILE-program? +\item What optimisations could a compiler perform when + compiling a WHILE-program? -\item What is the main difference between the Java assembler (as processed by Jasmin) and -Java Byte Code? +\item What is the main difference between the Java assembler + (as processed by Jasmin) and Java Byte Code? -\item Parser combinators can directly be given a string as input, without the need of a lexer. What are -the advantages to first lex a string and then feed a sequence of tokens as input to the parser? +\item Parser combinators can directly be given a string as + input, without the need of a lexer. What are the + advantages to first lex a string and then feed a + sequence of tokens as input to the parser? + \end{enumerate} \end{document} diff -r 201c2c6d8696 -r 7ed2a25dd115 hws/hw09.pdf Binary file hws/hw09.pdf has changed diff -r 201c2c6d8696 -r 7ed2a25dd115 hws/hw09.tex --- a/hws/hw09.tex Tue Oct 28 06:01:00 2014 +0000 +++ b/hws/hw09.tex Tue Oct 28 12:24:11 2014 +0000 @@ -1,25 +1,20 @@ \documentclass{article} -\usepackage{charter} -\usepackage{hyperref} -\usepackage{amssymb} -\usepackage{amsmath} -\usepackage{tikz} -\usetikzlibrary{automata} - -\newcommand{\dn}{\stackrel{\mbox{\scriptsize def}}{=}}% for definitions +\usepackage{../style} +\usepackage{../graphics} \begin{document} \section*{Homework 9} \begin{enumerate} -\item Describe what is meant by \emph{eliminating tail recursion}, when such an -optimization can be applied and why it is a benefit? +\item Describe what is meant by \emph{eliminating tail + recursion}, when such an optimization can be applied and + why it is a benefit? \item It is true (I confirmed it) that -\begin{center} -if $\varnothing$ does not occur in $r$ \;\;then\;\;$L(r) \not= \{\}$ +\begin{center} if $\varnothing$ does not occur in $r$ +\;\;then\;\;$L(r) \not= \{\}$ \end{center} \noindent