updated 5th handout
authorChristian Urban <christian dot urban at kcl dot ac dot uk>
Wed, 29 Oct 2014 13:08:11 +0000
changeset 263 8a42736cce27
parent 262 57269d9931da
child 264 0079db1a1c9d
updated 5th handout
handouts/ho01.pdf
handouts/ho01.tex
handouts/ho03.pdf
handouts/ho03.tex
handouts/ho04.pdf
handouts/ho04.tex
handouts/ho05.pdf
handouts/ho05.tex
Binary file handouts/ho01.pdf has changed
--- a/handouts/ho01.tex	Tue Oct 28 16:33:53 2014 +0000
+++ b/handouts/ho01.tex	Wed Oct 29 13:08:11 2014 +0000
@@ -586,8 +586,8 @@
 place each year. The book by Bruce Schneier about Applied
 Cryptography is also recommendable, though quite expensive.
 There is also another expensive book about penetration
-testing, but the readable chapter about passwords (Chapter 9)
-is free:
+testing, but the readable chapter about password attacks
+(Chapter 9) is free:
 
 \begin{center}
 \url{http://www.nostarch.com/pentesting}
Binary file handouts/ho03.pdf has changed
--- a/handouts/ho03.tex	Tue Oct 28 16:33:53 2014 +0000
+++ b/handouts/ho03.tex	Wed Oct 29 13:08:11 2014 +0000
@@ -32,7 +32,7 @@
     xmin=1996.5,
     xmax=2015,
     ymax=21,
-    ytick={0,2,...,20},
+    ytick={0,5,...,20},
     scaled ticks=false,
     axis lines=left,
     width=12cm,
@@ -48,10 +48,10 @@
 \end{tikzpicture}
 \end{center}
 
-\noindent
-This statistics seems to indicate that in the last five years the
-number of buffer overflow attacks is around 10\% of all attacks
-(whereby the absolute numbers of attacks seem to grow each year).
+\noindent This statistics indicates that in the last
+five years or so the number of buffer overflow attacks is
+around 10\% of all attacks (whereby the absolute numbers of
+attacks grow each year).
 
 
 To understand how buffer overflow attacks work, we have to have
@@ -173,18 +173,18 @@
 once \code{foo} has finished. The last stack pointer
 (\code{sp}) is needed in order to clean up the stack to the
 last level---in fact there is no cleaning involved, but just
-the top of the stack will be set back. So the last stack
-pointer also needs to be stored. The two buffers inside
-\pcode{foo} are on the stack too, because they are local data
-within \code{foo}. Consequently the stack in the middle is a
-snapshot after Line 3 has been executed. In case you are
-familiar with assembly instructions you can also read off this
-behaviour from the machine code that the \code{gcc} compiler
-generates for the program above:\footnote{You can make
-\pcode{gcc} generate assembly instructions if you call it with
-the \pcode{-S} option, for example \pcode{gcc -S out in.c}\;.
-Or you can look at this code by using the debugger. How to do
-this will be explained later.}
+the top of the stack will be set back to this address. So the
+last stack pointer also needs to be stored. The two buffers
+inside \pcode{foo} are on the stack too, because they are
+local data within \code{foo}. Consequently the stack in the
+middle is a snapshot after Line 3 has been executed. In case
+you are familiar with assembly instructions you can also read
+off this behaviour from the machine code that the \code{gcc}
+compiler generates for the program above:\footnote{You can
+make \pcode{gcc} generate assembly instructions if you call it
+with the \pcode{-S} option, for example \pcode{gcc -S out
+in.c}\;. Or you can look at this code by using the debugger.
+How to do this will be explained later.}
 
 \begin{center}\small
 \begin{tabular}[t]{@{}c@{\hspace{8mm}}c@{}}
@@ -201,16 +201,18 @@
 \pcode{main} prepares in Lines 2 to 7 the stack before calling
 the function \pcode{foo}. You can see that the numbers 3, 2, 1
 are stored on the stack (the register \code{$esp} refers to
-the top of the stack). On the right you can see how the
-function \pcode{foo} stores the two local buffers onto the
-stack and initialises them with the given data (Lines 2 to 9).
-Since there is no real computation going on inside
-\pcode{foo}, the function then just restores the stack to its
-old state and crucially sets the return address where the
-computation should resume (Line 9 in the code on the left-hand
-side). The instruction \code{ret} then transfers control back
-to the function \pcode{main} to the the instruction just after
-the call to \pcode{foo}, that is Line 9.
+the top of the stack; \pcode{$0x1}, \pcode{$0x2} \pcode{$0x3}
+are the encodings for \pcode{1} to \pcode{3}). On the right
+you can see how the function \pcode{foo} stores the two local
+buffers onto the stack and initialises them with the given
+data (Lines 2 to 9). Since there is no real computation going
+on inside \pcode{foo}, the function then just restores the
+stack to its old state and crucially sets the return address
+where the computation should resume (Line 9 in the code on the
+left-hand side). The instruction \code{ret} then transfers
+control back to the function \pcode{main} to the the
+instruction just after the call to \pcode{foo}, that is Line
+9.
  
 Another part of the ``conspiracy'' of buffer overflow attacks
 is that library functions in C look typically as follows:
@@ -297,7 +299,7 @@
 buffer, is stored on the stack before the older items, like
 return address and arguments. If it had be the other way
 around, then such an overwriting by overflowing a local buffer
-would just not work. If the designers of C had just been able
+would just not work. Had the designers of C had just been able
 to foresee what headaches their way of arranging the stack
 caused in the time where computers are accessible from
 everywhere. 
@@ -386,7 +388,7 @@
 
 \noindent These characters represent the machine code for
 opening a shell. It seems obtaining such a string requires
-higher-education in the architecture of the target system. But
+``higher-education'' in the architecture of the target system. But
 it is actually relatively simple: First there are many such
 string ready-made---just a quick Google query away. Second,
 tools like the debugger can help us again. We can just write
@@ -399,20 +401,21 @@
 the machine code, or even the ready-made encoding as character
 sequence. 
 
-While easy, obtaining this string is not entirely trivial.
-Remember the functions in C that copy or fill buffers work
-such that they copy everything until the zero byte is reached.
-Unfortunately the ``vanilla'' output from the debugger for the
-shell-program above will contain such zero bytes. So a
-post-processing phase is needed to rewrite the machine code in
-a way that it does not contain any zero bytes. This is like
-some works of literature that have been written so that the
-letter e, for example, is avoided. The technical term for such
-a literature work is \emph{lipogram}.\footnote{The most
-famous example of a lipogram is a 50,000 words novel titled
-Gadsby, see \url{https://archive.org/details/Gadsby}.} For
-rewriting the machine code, you might need to use clever
-tricks like
+While easy, obtaining this string is not entirely trivial
+using \pcode{gdb}. Remember the functions in C that copy or
+fill buffers work such that they copy everything until the
+zero byte is reached. Unfortunately the ``vanilla'' output
+from the debugger for the shell-program above will contain
+such zero bytes. So a post-processing phase is needed to
+rewrite the machine code in a way that it does not contain any
+zero bytes. This is like some works of literature that have
+been written so that the letter e, for example, is avoided.
+The technical term for such a literature work is
+\emph{lipogram}.\footnote{The most famous example of a
+lipogram is a 50,000 words novel titled Gadsby, see
+\url{https://archive.org/details/Gadsby}, which avoids the 
+letter `e' throughout.} For rewriting the
+machine code, you might need to use clever tricks like
 
 \begin{lstlisting}[numbers=none,language={[x86masm]Assembler}]
 xor %eax, %eax
@@ -485,7 +488,7 @@
 amount of time. If we now use an address that lets us jump to
 any address in the grey area we are done. The target machine
 will execute these \pcode{NOP} operations until it reaches the
-shellcode. A moment of thought can convince you that this
+shellcode. A moment of thought should convince you that this
 trick can hugely improve our odds of finding the right
 address---depending on the size of the buffer, it might only
 take a few tries to get the shellcode to run. And then we are
@@ -558,32 +561,32 @@
 
 \lstinputlisting[language=C]{../progs/C5.c}
 
-\noindent Here the programmer actually to take extra care to
-not fall pray to a buffer overflow attack, but in the process
-made the program susceptible to a format string attack.
-Clearly the \pcode{printf} function in Line 7 contains now
-an explicit format string, but because the commandline
-input is copied using the function \pcode{snprintf} the
-result will be the same---the string can be exploited 
-by embedding format strings into the user input. Here the
-programmer really cannot be blamed (much) because by using
-\pcode{snprintf} he or she tried to make sure only 10
-characters get copied into the local buffer---in this way
-avoiding the obvious buffer overflow attack.
+\noindent Here the programmer actually tried to take extra
+care to not fall pray to a buffer overflow attack, but in the
+process made the program susceptible to a format string
+attack. Clearly the \pcode{printf} function in Line 7 contains
+now an explicit format string, but because the commandline
+input is copied using the function \pcode{snprintf} the result
+will be the same---the string can be exploited by embedding
+format strings into the user input. Here the programmer really
+cannot be blamed (much) because by using \pcode{snprintf} he
+or she tried to make sure only 10 characters get copied into
+the local buffer---in this way avoiding the obvious buffer
+overflow attack.
 
 \subsubsection*{Caveats and Defences}
 
-How can we defend against these attacks? Well, a reflex could 
-be to blame programmers. Precautions should be taken so that 
-buffers cannot been overfilled and format strings should not
-be forgotten. This might actually be slightly simpler nowadays 
-since safe versions of the library functions exists, which
-always specify the precise number of bytes that should be 
-copied. Compilers also nowadays provide warnings when format
-strings are omitted. So proper education of programmers is 
-definitely a part of a defence against such attacks. However,
-if we leave it at that, then we have the mess we have today
-with new attacks discovered almost daily. 
+How can we defend against these attacks? Well, a reflex could
+be to blame programmers. Precautions should be taken by them
+so that buffers cannot been overfilled and format strings
+should not be forgotten. This might actually be slightly
+simpler nowadays since safe versions of the library functions
+exist, which always specify the precise number of bytes that
+should be copied. Compilers also nowadays provide warnings
+when format strings are omitted. So proper education of
+programmers is definitely a part of a defence against such
+attacks. However, if we leave it at that, then we have the
+mess we have today with new attacks discovered almost daily. 
 
 There is actually a quite long record of publications
 proposing defences against buffer overflow attacks. One method
@@ -711,9 +714,11 @@
 attacks.
 \bigskip
 
-\noindent If you want to know more about buffer overflow
-attacks, the original Phrack article ``Smashing The Stack For
-Fun And Profit'' by Elias Levy (also known as Aleph One) is an
+\subsubsection*{Further Reading}
+
+If you want to know more about buffer overflow attacks, the
+original Phrack article ``Smashing The Stack For Fun And
+Profit'' by Elias Levy (also known as Aleph One) is an
 engaging read:
 
 \begin{center}
Binary file handouts/ho04.pdf has changed
--- a/handouts/ho04.tex	Tue Oct 28 16:33:53 2014 +0000
+++ b/handouts/ho04.tex	Wed Oct 29 13:08:11 2014 +0000
@@ -285,13 +285,13 @@
 \subsubsection*{Multi-Agent Access Control}
 
 In military or banking, for example, very critical decisions
-need to be made using a \emph{two-men rule}. This means such
+need to be made using a \emph{two-people rule}. This means such
 decisions need to be taken by two people together, so that no
 single person can defraud a bank or start a nuclear war (you
 will know what I mean if you have seen the classic movie ``Dr
 Strangelove or: How I Learned to Stop Worrying and Love the
 Bomb''\footnote{\url{http://en.wikipedia.org/wiki/Dr._Strangelove}}).
-Translating the two-men rule into a software system seems not
+Translating the two-people rule into a software system seems not
 as straightforward as one might think.
 
 Let us assume we want to implement a system where CEOs can
Binary file handouts/ho05.pdf has changed
--- a/handouts/ho05.tex	Tue Oct 28 16:33:53 2014 +0000
+++ b/handouts/ho05.tex	Wed Oct 29 13:08:11 2014 +0000
@@ -7,27 +7,170 @@
 
 \section*{Handout 5 (Protocols)}
 
+Protocols are the computer science equivalent to fractals and
+the Mandelbrot set in mathematics. With the latter you have a
+simple formula which you just iterate and then you test
+whether a point is inside or outside a region, and voila
+something magically
+happened.\footnote{\url{http://en.wikipedia.org/wiki/Fractal},
+\url{http://en.wikipedia.org/wiki/Mandelbrot_set}} Protocols
+are similar: they are simple exchanges of messages, but in the
+end something ``magical'' can happen---for example a secret
+channel has been established or two entities have
+authenticated themselves to each other. The problem with magic
+is of course it is poorly understood and even experts often
+got, and get, it wrong with protocols. 
+
+To have an idea what kind of protocols we are interested, let
+us look at a few examples. One example are (wireless) key 
+fobs which operate the central locking system and the
+ignition in a car.
+
+\begin{center}
+\includegraphics[scale=0.075]{../pics/keyfob.jpg}
+\quad
+\includegraphics[scale=0.2025]{../pics/startstop.jpg}
+\end{center}
+
+\noindent The point of these key fobs is that everything is
+done over the ``air''---there is no physical connection
+between the key, doors and engine. So we must achieve security
+by exchanging certain messages between the key fob on one side
+and doors and engine on the other. Clearly what we like to
+achieve is that I can get into my car and start it, but that
+thieves are kept out. The problem is that everybody can
+``overhear'' or skim the exchange of messages between the key
+fob and car. In this scenario the simplest attack you need to
+defend against is a person-in-the-middle attack. Imagine you
+park your car in front of a supermarket. One thief follows you
+with a strong transmitter. A second thief ``listens'' to the
+signal from the car and wirelessly transmits it to the
+``colleague'' who followed you and who silently enquires about
+the answer from the key fob. The answer is then send back to
+the thief at the car, which then dutifully opens and possibly
+starts. No need to steal your key anymore.
+
+But there are many more such protocols we like to consider.
+Other examples are wifi---you might sit at a Starbucks and
+talk wirelessly to the free access point there and from there
+talk with your bank, for example. Also even if your have to
+touch your Oyster card at the reader each time you enter and
+exit the Tube, it actually operates wirelessly and with
+appropriate equipment over some quite large distance. But
+there are many many more examples (Bitcoins, mobile
+phones,\ldots). The common characteristics of the protocols we
+are interested in here is that an adversary or attacker is
+assumed to be in complete control over the network or channel
+over which you exchanging messages. An attacker can install a
+packet sniffer on a network, inject packets, modify packets,
+replay old messages, or fake pretty much everything. In this
+hostile environment, the purpose of protocols (that is
+exchange of messages) is to achieve some security goal, for
+example only allow the owner of the car in but everybody else
+should be kept out.
+
 The protocols we are interested here are generic descriptions
 of how to exchange messages in order to achieve a goal, be it
 establishing a mutual secure connection or being able to
-authenticate to a system. Our notion of protocol is
-deliberately quite general: it includes situations like the
-messages send between a key fob and a car in order to open
-doors or the messages that participants need to exchange in
-order to mine Bitcoins (which is often already called Bitcoin
-\emph{protocol}).
+authenticate to a system. Unlike the distant past where for
+example we had to meet a person in order to authenticate him
+or her (via a passport for example), the problem we are facing
+on the Internet is that we cannot easily be sure who we are
+``talking'' to. The obvious reason is that only some electrons
+arrive at our computer; we do not see the person, or computer,
+behind the incoming electrons (messages). 
+
+To start, let us look at one of the simplest protocols that
+are part of the TCP protocol (which underlies the Internet).
+This protocol does not do anything security relevant, it just
+establishes a ``hello'' from a client to a server which the
+server answers with ``I heard you'' and the client answers 
+in turn with something like ``thanks''. This protocol
+is often called a \emph{three-way handshake}. Graphically it
+can be illustrated as follows
+
+\begin{center}
+\includegraphics[scale=0.5]{../pics/handshake.png}
+\end{center}
+
+\noindent On the left-hand side is a client, say Alice, on the
+right-hand side is a server, say. Time is running from top to
+bottom. Alice initial SYN message needs some time to travel to
+the server. The server answers with SYN-ACK, which will
+require some time to arrive at Alice. Her answer ACK will
+again take some time to arrive at the server. After the 
+messages are exchanged Alice and the server simply have 
+established a channel to communicate over. Alice does
+not know whether she is really talking to the server (somebody 
+else on the network might have intercepted her message
+and replied in place of the server). Similarly, the
+server has no idea who it is talking to. That this can be 
+established depends on what is exchanged next and is the
+point of the protocols we want to study in more detail.
+
+Before we start in earnest, we need to fix a more
+convenient notation for protocols. Drawing pictures like
+the one above would be awkward in the long-run. The
+notation already abstracts away from a few details we are
+not interested in: for example the time the messages
+need to travel between endpoints. What we are interested
+in is in which order the messages are sent. For the SYN-ACK
+protocol we will therefore use the notation 
 
-Unlike the distant past where for example we had to meet a
-person in order to authenticate him or her (via a passport for
-example), the problem we are facing is that on the Internet we
-cannot easily be sure who we are ``talking'' to. The obvious
-reason is that only some electrons arrive at our computer; we
-do not see the person, or computer, behind the incoming
-electrons. Often there are is also no person behind the
-messages, rather than a computer system.
+\begin{center}
+\begin{tabular}{l@{\hspace{2mm}}l}
+$A \to S$: & $SYN$\\
+$S \to A$: & $SYN\_ACK$\\
+$A \to S$: & $ACK$\\
+\end{tabular}
+\end{center}
+
+\noindent The left-hand side specifies who is the sender and
+who is the receiver of the message. On the right of the colon
+is the message that is send. The order from top to down
+specifies in which order the messages are sent. We also
+have the convention that messages like above $SYN$ are send
+in clear-text over the network. If we want that a message is 
+encrypted, then we use the notation
+
+\[
+\{msg\}_{K_{AB}}
+\]  
+  
+  
+\noindent for messages. The curly braces indicate a kind of
+envelope which can only be opened if you know the key $K_{AB}$
+with which the message has been encrypted. We always assume
+that an attacker, say Eve, cannot get the content of the
+message, unless she is also in the possession of the key. We
+explicitly exclude in our study that the encryption can be
+broken.\footnote{\ldots{}which of course is what a good
+protocol designer needs to ensure and more often than not
+protocols are broken. For example Oyster cards contain a very
+weak encryption mechanism which has been attacked.} It is also
+possible that an encrypted message contains several parts. In
+this case we would write something like
+
+\[
+\{msg_1, msg_2\}_{K_{AB}}
+\] 
+
+\noindent But again Eve would not be able to know 
+this unless she also has the key. We also allow the 
+possibility that a message is encrypted twice under 
+different keys. In this case we write
+
+\[
+\{\{msg\}_{K_{AB}}\}_{K_{BC}}
+\] 
 
 
 
+Note, however,
+while an attacker cannot obtain the content of the message
+without the key, this encrypted message can be observed
+and be recorded and then replayed at another time.
+
 Keyfobs - protocol
 
 {\small