# HG changeset patch
# User Christian Urban <christian dot urban at kcl dot ac dot uk>
# Date 1414499051 0
# Node ID 7ed2a25dd115d793889e7140ff3d27786f08f8f0
# Parent  201c2c6d86964fa4e3b6d31cd5f3492b7e47c03e
updated

diff -r 201c2c6d8696 -r 7ed2a25dd115 handouts/ho03.pdf
Binary file handouts/ho03.pdf has changed
diff -r 201c2c6d8696 -r 7ed2a25dd115 handouts/ho03.tex
--- a/handouts/ho03.tex	Tue Oct 28 06:01:00 2014 +0000
+++ b/handouts/ho03.tex	Tue Oct 28 12:24:11 2014 +0000
@@ -113,7 +113,7 @@
 the set of strings accepted by an automaton.
   
 
-While with DFAs it will always clear that given a character
+While with DFAs it will always be clear that given a character
 what the next state is (potentially none), it will be useful
 to relax this restriction. That means we have several
 potential successor states. We even allow ``silent
@@ -374,10 +374,10 @@
 
 \subsubsection*{Subset Construction}
 
-What is interesting that for every NFA we can find a DFA which
-recognises the same language. This can be done by the
-\emph{subset construction}. Consider again the NFA on the 
-left, consisting of nodes labeled $0$, $1$ and $2$. 
+What is interesting is that for every NFA we can find a DFA
+which recognises the same language. This can, for example, be
+done by the \emph{subset construction}. Consider again the NFA
+on the left, consisting of nodes labeled $0$, $1$ and $2$. 
 
 \begin{center}
 \begin{tabular}{c@{\hspace{10mm}}c}
@@ -440,27 +440,174 @@
 can be calculated from the starting state of the NFA, that is
 $0$, and then do as many $\epsilon$-transitions as possible.
 This gives $\{0,1,2\}$ which is the starting state of the DFA.
-One terminal states in the DFA are given by all sets that
+The terminal states in the DFA are given by all sets that
 contain a $2$, which is the terminal state of the NFA. This
 completes the subset construction.
 
-There are two points to note: One is that the resulting DFA
-contains a number of ``dead'' nodes that are never reachable
-from the starting state (that is that the calculated DFA in
-this example is not a minimal DFA). Such dead nodes can be
-safely removed without changing the language that is
-recognised by the DFA. Another point is that in some cases the
-subset construction produces a DFA that does \emph{not}
-contain any dead nodes\ldots{}that means it calculates a
-minimal DFA. Which in turn means that in some cases the number
-of nodes by going from NFAs to DFAs exponentially increases,
-namely by $2^n$ (which is the number of subsets you can form
-for $n$ nodes). 
+There are two points to note: One is that very often the
+resulting DFA contains a number of ``dead'' nodes that are
+never reachable from the starting state (that is that the
+calculated DFA in this example is not a minimal DFA). Such
+dead nodes can be safely removed without changing the language
+that is recognised by the DFA. Another point is that in some
+cases, however, the subset construction produces a DFA that
+does \emph{not} contain any dead nodes\ldots{}that means it
+calculates a minimal DFA. Which in turn means that in some
+cases the number of nodes by going from NFAs to DFAs
+exponentially increases, namely by $2^n$ (which is the number
+of subsets you can form for $n$ nodes). 
 
 
 \subsubsection*{Brzozowski's Method}
 
-DFA -> NFA
+As said before, we can also go into the other direction---from
+DFAs to regular expressions. Brzozowski's method calculates
+a regular expression using familiar transformations for
+solving equational systems. Consider the DFA:
+
+\begin{center}
+\begin{tikzpicture}[scale=1.5,>=stealth',very thick,auto,
+                    every state/.style={minimum size=0pt,
+                    inner sep=2pt,draw=blue!50,very thick,
+                    fill=blue!20}]
+  \node[state, initial]        (q0) at ( 0,1) {$q_0$};
+  \node[state]                    (q1) at ( 1,1) {$q_1$};
+  \node[state, accepting] (q2) at ( 2,1) {$q_2$};
+  \path[->] (q0) edge[bend left] node[above] {$a$} (q1)
+            (q1) edge[bend left] node[above] {$b$} (q0)
+            (q2) edge[bend left=50] node[below] {$b$} (q0)
+            (q1) edge node[above] {$a$} (q2)
+            (q2) edge [loop right] node {$a$} ()
+            (q0) edge [loop below] node {$b$} ();
+\end{tikzpicture}
+\end{center}
+
+\noindent for which we can set up the following equational
+system
+
+\begin{eqnarray}
+q_0 & = & \epsilon + q_0\,b + q_1\,b +  q_2\,b\\
+q_1 & = & q_0\,a\\
+q_2 & = & q_1\,a + q_2\,a
+\end{eqnarray}
+
+\noindent There is an equation for each node in the DFA. Let
+us have a look how the right-hand sides of the equations are
+constructed. First have a look at the second equation: the
+left-hand side is $q_1$ and the right-hand side $q_0\,a$. The
+right-hand side is essentially all possible ways how to end up
+in $q_1$. There is only one incoming edge from $q_0$ consuming
+an $a$. Therefore we say: if we are in $q_0$ consuming an $a$
+then we end up in $q_1$. Therefore the right hand side is
+state followed by character---in this case $q_0\,a$. Now lets
+have a look at the third equation: there are two incoming
+edges. Therefore we have two terms, namely $q_1\,a$ and
+$q_2\,a$. These terms are separated by $+$. The first states
+that if in state $q_1$ consuming an $a$ will bring you to
+$q_2$, and the secont that being in $q_2$ and consuming an $a$
+will make you stay in $q_2$. The right-hand side of the
+first equation is constructed similarly: there are three 
+incoming edges, therefore there are three terms. There is
+one exception in that we also ``add'' $\epsilon$ to the
+first equation, because it corresponds to the starting state
+in the DFA.
+
+Having constructed the equational system, the question is
+how to solve it? Remarkably the rules are very similar to
+solving usual linear equational systems. For example the
+second equation does not contain the variable $q_1$ on the
+right-hand side of the equation. We can therefore eliminate 
+$q_1$ from the system by just substituting this equation
+into the other two. This gives
+
+\begin{eqnarray}
+q_0 & = & \epsilon + q_0\,b + q_0\,a\,b +  q_2\,b\\
+q_2 & = & q_0\,a\,a + q_2\,a
+\end{eqnarray}
+
+\noindent where in Equation (4) we have two occurences
+of $q_0$. Like the laws about $+$ and $\cdot$, we can simplify 
+Equation (4) to obtain the following two equations:
+
+\begin{eqnarray}
+q_0 & = & \epsilon + q_0\,(b + a\,b) +  q_2\,b\\
+q_2 & = & q_0\,a\,a + q_2\,a
+\end{eqnarray}
+ 
+\noindent Unfortunately we cannot make any more progress with
+substituting equations, because both (6) and (7) contain the
+variable on the left-hand side also on the right-hand side.
+Here we need to now use a law that is different from the usual
+laws. It is called \emph{Arden's rule}. It states that
+if an equation is of the form $q = q\,r + s$ then it can be
+transformed to $q = s\, r^*$. Since we can assume $+$ is 
+symmetric, equation (7) is of that form: $s$ is $q_0\,a\,a$
+and $r$ is $a$. That means we can transform Equation (7)
+to obtain the two new equations
+
+\begin{eqnarray}
+q_0 & = & \epsilon + q_0\,(b + a\,b) +  q_2\,b\\
+q_2 & = & q_0\,a\,a\,(a^*)
+\end{eqnarray}
+
+\noindent Now again we can substitute the second equation into
+the first in order to eliminate the variable $q_2$.
+
+\begin{eqnarray}
+q_0 & = & \epsilon + q_0\,(b + a\,b) +  q_0\,a\,a\,(a^*)\,b
+\end{eqnarray}
+
+\noindent Pulling $q_0$ out as a single factor gives:
+
+\begin{eqnarray}
+q_0 & = & \epsilon + q_0\,(b + a\,b + a\,a\,(a^*)\,b)
+\end{eqnarray}
+
+\noindent This equation is again of the form so that we can
+apply Arden's rule ($r$ is $b + a\,b + a\,a\,(a^*)\,b$ and $s$
+is $\epsilon$). This gives as solution for $q_0$ the following
+regular expression:
+
+\begin{eqnarray}
+q_0 & = & \epsilon\,(b + a\,b + a\,a\,(a^*)\,b)^*
+\end{eqnarray}
+
+\noindent SInce this is a regular expression, we can simplify
+away the $\epsilon$ to obtain the slightly simpler regular
+expression
+
+\begin{eqnarray}
+q_0 & = & (b + a\,b + a\,a\,(a^*)\,b)^*
+\end{eqnarray}
+
+\noindent 
+Now we can unwind this process and obtain the solutions
+for the other equations. This gives:
+
+\begin{eqnarray}
+q_0 & = & (b + a\,b + a\,a\,(a^*)\,b)^*\\
+q_1 & = & (b + a\,b + a\,a\,(a^*)\,b)^*\,a\\
+q_2 & = & (b + a\,b + a\,a\,(a^*)\,b)^*\,a\,a\,(a)^*
+\end{eqnarray}
+
+\noindent Finally, we only need to ``add'' up the equations
+which correspond to a terminal state. In our running example,
+this is just $q_2$. Consequently, a regular expression
+that recognises the same language as the automaton is
+
+\[
+(b + a\,b + a\,a\,(a^*)\,b)^*\,a\,a\,(a)^*
+\]
+
+\noindent You can somewhat crosscheck your solution
+by taking a string the regular expression can match and
+and see whether it can be matched by the automaton.
+One string for example is $aaa$ and \emph{voila} this 
+string is also matched by the automaton.
+
+We should prove that Brzozowski's method really produces
+an equivalent  regular expression for the automaton. But
+for the purposes of this module, we omit this.
 
 \subsubsection*{Automata Minimization}
 
@@ -694,7 +841,18 @@
 the accepting and non-accepting states. You can then translate
 this DFA back into a regular expression. 
 
-Not all languages are regular\ldots{}
+Not all languages are regular. The most well-known example
+of a language that is not regular consists of all the strings
+of the form
+
+\[a^n\,b^n\]
+
+\noindent meaning strings that have the same number of $a$s
+and $b$s. You can try, but you cannot find a regular
+expression for this language and also not an automaton. One
+can actually prove that there is no regular expression nor
+automaton for this language, but again that would lead us too
+far afield for what we want to do in this module.
 
 
 \end{document}
diff -r 201c2c6d8696 -r 7ed2a25dd115 handouts/ho05.pdf
Binary file handouts/ho05.pdf has changed
diff -r 201c2c6d8696 -r 7ed2a25dd115 handouts/ho05.tex
--- a/handouts/ho05.tex	Tue Oct 28 06:01:00 2014 +0000
+++ b/handouts/ho05.tex	Tue Oct 28 12:24:11 2014 +0000
@@ -1,59 +1,25 @@
 \documentclass{article}
-\usepackage{hyperref}
-\usepackage{amssymb}
-\usepackage{amsmath}
-\usepackage[T1]{fontenc}
-\usepackage{tikz}
-\usetikzlibrary{arrows}
-\usetikzlibrary{automata}
-\usetikzlibrary{shapes}
-\usetikzlibrary{shadows}
-\usetikzlibrary{positioning}
-\usetikzlibrary{calc}
-\usetikzlibrary{fit}
-\usetikzlibrary{backgrounds}
+\usepackage{../style}
 \usepackage{../langs}
-
-\newcommand{\dn}{\stackrel{\mbox{\scriptsize def}}{=}}%
+\usepackage{../graphics}
 
-	
-\newcommand\grid[1]{%
-\begin{tikzpicture}[baseline=(char.base)]
-  \path[use as bounding box]
-    (0,0) rectangle (1em,1em);
-  \draw[red!50, fill=red!20]
-    (0,0) rectangle (1em,1em);
-  \node[inner sep=1pt,anchor=base west]
-    (char) at (0em,\gridraiseamount) {#1};
-\end{tikzpicture}}
-\newcommand\gridraiseamount{0.12em}
-
-\makeatletter
-\newcommand\Grid[1]{%
-  \@tfor\z:=#1\do{\grid{\z}}}
-\makeatother	
-
-\newcommand\Vspace[1][.3em]{%
-  \mbox{\kern.06em\vrule height.3ex}%
-  \vbox{\hrule width#1}%
-  \hbox{\vrule height.3ex}}
-
-\def\VS{\Vspace[0.6em]}
-	
 \begin{document}
 
-\section*{Handout 5}
+\section*{Handout 5 (Lexing)}
 
-Whenever you want to design a new programming language or implement a compiler for an
-existing language, the first task is to fix the basic ``words'' of the language. For example what are the 
-keywords, or reserved words, of the language, what are permitted identifiers, numbers, expressions and so 
-on. One convenient way to do this is, of 
-course, by using regular expressions. 
+Whenever you want to design a new programming language or
+implement a compiler for an existing language, the first task
+is to fix the basic ``words'' of the language. For example
+what are the keywords, or reserved words, of the language,
+what are permitted identifiers, numbers, expressions and so
+on. One convenient way to do this is, of course, by using
+regular expressions. 
 
-In this course we want to take a closer look at the 
-WHILE programming language. This is a simple imperative programming language consisting of arithmetic
-expressions, assignments, if-statements and loops. For example the Fibonacci program can be
-written in this language as follows:
+In this course we want to take a closer look at the WHILE
+programming language. This is a simple imperative programming
+language consisting of arithmetic expressions, assignments,
+if-statements and loops. For example the Fibonacci program can
+be written in this language as follows:
 
 \begin{center}
 \mbox{\lstinputlisting[language=while]{../progs/fib.while}}
diff -r 201c2c6d8696 -r 7ed2a25dd115 handouts/ho06.pdf
Binary file handouts/ho06.pdf has changed
diff -r 201c2c6d8696 -r 7ed2a25dd115 handouts/ho06.tex
--- a/handouts/ho06.tex	Tue Oct 28 06:01:00 2014 +0000
+++ b/handouts/ho06.tex	Tue Oct 28 12:24:11 2014 +0000
@@ -42,7 +42,7 @@
 	
 \begin{document}
 
-\section*{Handout 6}
+\section*{Handout 6 (Parser Combinators)}
 
 While regular expressions are very useful for lexing and for recognising
 many patterns in strings (like email addresses), they have their limitations. For
diff -r 201c2c6d8696 -r 7ed2a25dd115 hws/hw01.pdf
Binary file hws/hw01.pdf has changed
diff -r 201c2c6d8696 -r 7ed2a25dd115 hws/hw02.pdf
Binary file hws/hw02.pdf has changed
diff -r 201c2c6d8696 -r 7ed2a25dd115 hws/hw02.tex
--- a/hws/hw02.tex	Tue Oct 28 06:01:00 2014 +0000
+++ b/hws/hw02.tex	Tue Oct 28 12:24:11 2014 +0000
@@ -50,7 +50,7 @@
   Write down clearly in each case what you need to prove and
   what are the assumptions. 
   
-\item Define what is mean by the derivative of a regular expressions
+\item Define what is meant by the derivative of a regular expressions
   with respoect to a character. (Hint: The derivative is defined
   recursively.)
 
diff -r 201c2c6d8696 -r 7ed2a25dd115 hws/hw03.pdf
Binary file hws/hw03.pdf has changed
diff -r 201c2c6d8696 -r 7ed2a25dd115 hws/hw03.tex
--- a/hws/hw03.tex	Tue Oct 28 06:01:00 2014 +0000
+++ b/hws/hw03.tex	Tue Oct 28 12:24:11 2014 +0000
@@ -59,7 +59,10 @@
   a transition for every letter from the alphabet.)
 
   \begin{center}
-    \begin{tikzpicture}[scale=2, line width=0.7mm]
+    \begin{tikzpicture}[>=stealth',very thick,auto,
+                        every state/.style={minimum size=0pt,
+                        inner sep=2pt,draw=blue!50,very thick,
+                        fill=blue!20},scale=2]
       \node[state, initial]        (q0) at ( 0,1) {$q_0$};
       \node[state, accepting]  (q1) at ( 1,1) {$q_1$};
       \path[->] (q0) edge node[above] {$a$} (q1)
@@ -92,7 +95,10 @@
   recognises the same language:
 
   \begin{center}
-    \begin{tikzpicture}[scale=3, line width=0.7mm]
+    \begin{tikzpicture}[>=stealth',very thick,auto,
+                        every state/.style={minimum size=0pt,
+                        inner sep=2pt,draw=blue!50,very thick,
+                        fill=blue!20},scale=2]
       \node[state, initial]        (q0) at ( 0,1) {$q_0$};
       \node[state]                    (q1) at ( 1,1) {$q_1$};
       \node[state, accepting] (q2) at ( 2,1) {$q_2$};
@@ -108,7 +114,10 @@
   case states can be merged, state clearly which states can be merged.
 
   \begin{center}
-    \begin{tikzpicture}[scale=2, line width=0.7mm]
+    \begin{tikzpicture}[>=stealth',very thick,auto,
+                        every state/.style={minimum size=0pt,
+                        inner sep=2pt,draw=blue!50,very thick,
+                        fill=blue!20},scale=2]
       \node[state, initial]        (q0) at ( 0,1) {$q_0$};
       \node[state]                    (q1) at ( 1,1) {$q_1$};
       \node[state, accepting] (q4) at ( 2,1) {$q_4$};
@@ -129,7 +138,10 @@
 \item Given the following finite deterministic automaton over the alphabet $\{a, b\}$:
 
   \begin{center}
-    \begin{tikzpicture}[scale=2, line width=0.5mm]
+    \begin{tikzpicture}[scale=2,>=stealth',very thick,auto,
+                        every state/.style={minimum size=0pt,
+                        inner sep=2pt,draw=blue!50,very thick,
+                        fill=blue!20}]
       \node[state, initial, accepting]        (q0) at ( 0,1) {$q_0$};
       \node[state, accepting]                    (q1) at ( 1,1) {$q_1$};
       \node[state] (q2) at ( 2,1) {$q_2$};
diff -r 201c2c6d8696 -r 7ed2a25dd115 hws/hw04.pdf
Binary file hws/hw04.pdf has changed
diff -r 201c2c6d8696 -r 7ed2a25dd115 hws/hw04.tex
--- a/hws/hw04.tex	Tue Oct 28 06:01:00 2014 +0000
+++ b/hws/hw04.tex	Tue Oct 28 12:24:11 2014 +0000
@@ -1,8 +1,6 @@
 \documentclass{article}
 \usepackage{../style}
-
-\usepackage{tikz}
-\usetikzlibrary{automata}
+\usepackage{../graphics}
 
 \begin{document}
 
diff -r 201c2c6d8696 -r 7ed2a25dd115 hws/hw05.pdf
Binary file hws/hw05.pdf has changed
diff -r 201c2c6d8696 -r 7ed2a25dd115 hws/hw05.tex
--- a/hws/hw05.tex	Tue Oct 28 06:01:00 2014 +0000
+++ b/hws/hw05.tex	Tue Oct 28 12:24:11 2014 +0000
@@ -1,12 +1,7 @@
 \documentclass{article}
-\usepackage{charter}
-\usepackage{hyperref}
-\usepackage{amssymb}
-\usepackage{amsmath}
-\usepackage{tikz}
-\usetikzlibrary{automata}
+\usepackage{../style}
+\usepackage{../graphics}
 
-\newcommand{\dn}{\stackrel{\mbox{\scriptsize def}}{=}}% for definitions
 
 \begin{document}
 
diff -r 201c2c6d8696 -r 7ed2a25dd115 hws/hw06.pdf
Binary file hws/hw06.pdf has changed
diff -r 201c2c6d8696 -r 7ed2a25dd115 hws/hw06.tex
--- a/hws/hw06.tex	Tue Oct 28 06:01:00 2014 +0000
+++ b/hws/hw06.tex	Tue Oct 28 12:24:11 2014 +0000
@@ -1,12 +1,6 @@
 \documentclass{article}
-\usepackage{charter}
-\usepackage{hyperref}
-\usepackage{amssymb}
-\usepackage{amsmath}
-\usepackage{tikz}
-\usetikzlibrary{automata}
-
-\newcommand{\dn}{\stackrel{\mbox{\scriptsize def}}{=}}% for definitions
+\usepackage{../style}
+\usepackage{../graphics}
 
 \begin{document}
 
diff -r 201c2c6d8696 -r 7ed2a25dd115 hws/hw07.pdf
Binary file hws/hw07.pdf has changed
diff -r 201c2c6d8696 -r 7ed2a25dd115 hws/hw07.tex
--- a/hws/hw07.tex	Tue Oct 28 06:01:00 2014 +0000
+++ b/hws/hw07.tex	Tue Oct 28 12:24:11 2014 +0000
@@ -1,12 +1,6 @@
 \documentclass{article}
-\usepackage{charter}
-\usepackage{hyperref}
-\usepackage{amssymb}
-\usepackage{amsmath}
-\usepackage{tikz}
-\usetikzlibrary{automata}
-
-\newcommand{\dn}{\stackrel{\mbox{\scriptsize def}}{=}}% for definitions
+\usepackage{../style}
+\usepackage{../graphics}
 
 \begin{document}
 
diff -r 201c2c6d8696 -r 7ed2a25dd115 hws/hw08.pdf
Binary file hws/hw08.pdf has changed
diff -r 201c2c6d8696 -r 7ed2a25dd115 hws/hw08.tex
--- a/hws/hw08.tex	Tue Oct 28 06:01:00 2014 +0000
+++ b/hws/hw08.tex	Tue Oct 28 12:24:11 2014 +0000
@@ -1,27 +1,26 @@
 \documentclass{article}
-\usepackage{charter}
-\usepackage{hyperref}
-\usepackage{amssymb}
-\usepackage{amsmath}
-\usepackage{tikz}
-\usetikzlibrary{automata}
-
-\newcommand{\dn}{\stackrel{\mbox{\scriptsize def}}{=}}% for definitions
+\usepackage{../style}
+\usepackage{../graphics}
 
 \begin{document}
 
 \section*{Homework 8}
 
 \begin{enumerate}
-\item Write a program in the WHILE-language that calculates the factorial function.
+\item Write a program in the WHILE-language that calculates
+      the factorial function.
 
-\item What optimisations could a compiler perform when compiling a WHILE-program?
+\item What optimisations could a compiler perform when
+      compiling a WHILE-program?
 
-\item What is the main difference between the Java assembler (as processed by Jasmin) and
-Java Byte Code?
+\item What is the main difference between the Java assembler
+      (as processed by Jasmin) and Java Byte Code?
 
-\item Parser combinators can directly be given a string as input, without the need of a lexer. What are
-the advantages to first lex a string and then feed a sequence of tokens as input to the parser? 
+\item Parser combinators can directly be given a string as
+      input, without the need of a lexer. What are the
+      advantages to first lex a string and then feed a
+      sequence of tokens as input to the parser?
+      
 \end{enumerate}
 
 \end{document}
diff -r 201c2c6d8696 -r 7ed2a25dd115 hws/hw09.pdf
Binary file hws/hw09.pdf has changed
diff -r 201c2c6d8696 -r 7ed2a25dd115 hws/hw09.tex
--- a/hws/hw09.tex	Tue Oct 28 06:01:00 2014 +0000
+++ b/hws/hw09.tex	Tue Oct 28 12:24:11 2014 +0000
@@ -1,25 +1,20 @@
 \documentclass{article}
-\usepackage{charter}
-\usepackage{hyperref}
-\usepackage{amssymb}
-\usepackage{amsmath}
-\usepackage{tikz}
-\usetikzlibrary{automata}
-
-\newcommand{\dn}{\stackrel{\mbox{\scriptsize def}}{=}}% for definitions
+\usepackage{../style}
+\usepackage{../graphics}
 
 \begin{document}
 
 \section*{Homework 9}
 
 \begin{enumerate}
-\item Describe what is meant by \emph{eliminating tail recursion}, when such an 
-optimization can be applied and why it is a benefit?
+\item Describe what is meant by \emph{eliminating tail
+      recursion}, when such an optimization can be applied and
+      why it is a benefit?
 
 \item It is true (I confirmed it) that
 
-\begin{center}
-if $\varnothing$ does not occur in $r$ \;\;then\;\;$L(r) \not= \{\}$
+\begin{center} if $\varnothing$ does not occur in $r$
+\;\;then\;\;$L(r) \not= \{\}$ 
 \end{center}
 
 \noindent