# HG changeset patch # User Christian Urban # Date 1414588091 0 # Node ID 8a42736cce27be6f9c52ca1316d8670751b65e5c # Parent 57269d9931daa57b7a4453a49b5d716411ce55e5 updated 5th handout diff -r 57269d9931da -r 8a42736cce27 handouts/ho01.pdf Binary file handouts/ho01.pdf has changed diff -r 57269d9931da -r 8a42736cce27 handouts/ho01.tex --- a/handouts/ho01.tex Tue Oct 28 16:33:53 2014 +0000 +++ b/handouts/ho01.tex Wed Oct 29 13:08:11 2014 +0000 @@ -586,8 +586,8 @@ place each year. The book by Bruce Schneier about Applied Cryptography is also recommendable, though quite expensive. There is also another expensive book about penetration -testing, but the readable chapter about passwords (Chapter 9) -is free: +testing, but the readable chapter about password attacks +(Chapter 9) is free: \begin{center} \url{http://www.nostarch.com/pentesting} diff -r 57269d9931da -r 8a42736cce27 handouts/ho03.pdf Binary file handouts/ho03.pdf has changed diff -r 57269d9931da -r 8a42736cce27 handouts/ho03.tex --- a/handouts/ho03.tex Tue Oct 28 16:33:53 2014 +0000 +++ b/handouts/ho03.tex Wed Oct 29 13:08:11 2014 +0000 @@ -32,7 +32,7 @@ xmin=1996.5, xmax=2015, ymax=21, - ytick={0,2,...,20}, + ytick={0,5,...,20}, scaled ticks=false, axis lines=left, width=12cm, @@ -48,10 +48,10 @@ \end{tikzpicture} \end{center} -\noindent -This statistics seems to indicate that in the last five years the -number of buffer overflow attacks is around 10\% of all attacks -(whereby the absolute numbers of attacks seem to grow each year). +\noindent This statistics indicates that in the last +five years or so the number of buffer overflow attacks is +around 10\% of all attacks (whereby the absolute numbers of +attacks grow each year). To understand how buffer overflow attacks work, we have to have @@ -173,18 +173,18 @@ once \code{foo} has finished. The last stack pointer (\code{sp}) is needed in order to clean up the stack to the last level---in fact there is no cleaning involved, but just -the top of the stack will be set back. So the last stack -pointer also needs to be stored. The two buffers inside -\pcode{foo} are on the stack too, because they are local data -within \code{foo}. Consequently the stack in the middle is a -snapshot after Line 3 has been executed. In case you are -familiar with assembly instructions you can also read off this -behaviour from the machine code that the \code{gcc} compiler -generates for the program above:\footnote{You can make -\pcode{gcc} generate assembly instructions if you call it with -the \pcode{-S} option, for example \pcode{gcc -S out in.c}\;. -Or you can look at this code by using the debugger. How to do -this will be explained later.} +the top of the stack will be set back to this address. So the +last stack pointer also needs to be stored. The two buffers +inside \pcode{foo} are on the stack too, because they are +local data within \code{foo}. Consequently the stack in the +middle is a snapshot after Line 3 has been executed. In case +you are familiar with assembly instructions you can also read +off this behaviour from the machine code that the \code{gcc} +compiler generates for the program above:\footnote{You can +make \pcode{gcc} generate assembly instructions if you call it +with the \pcode{-S} option, for example \pcode{gcc -S out +in.c}\;. Or you can look at this code by using the debugger. +How to do this will be explained later.} \begin{center}\small \begin{tabular}[t]{@{}c@{\hspace{8mm}}c@{}} @@ -201,16 +201,18 @@ \pcode{main} prepares in Lines 2 to 7 the stack before calling the function \pcode{foo}. You can see that the numbers 3, 2, 1 are stored on the stack (the register \code{$esp} refers to -the top of the stack). On the right you can see how the -function \pcode{foo} stores the two local buffers onto the -stack and initialises them with the given data (Lines 2 to 9). -Since there is no real computation going on inside -\pcode{foo}, the function then just restores the stack to its -old state and crucially sets the return address where the -computation should resume (Line 9 in the code on the left-hand -side). The instruction \code{ret} then transfers control back -to the function \pcode{main} to the the instruction just after -the call to \pcode{foo}, that is Line 9. +the top of the stack; \pcode{$0x1}, \pcode{$0x2} \pcode{$0x3} +are the encodings for \pcode{1} to \pcode{3}). On the right +you can see how the function \pcode{foo} stores the two local +buffers onto the stack and initialises them with the given +data (Lines 2 to 9). Since there is no real computation going +on inside \pcode{foo}, the function then just restores the +stack to its old state and crucially sets the return address +where the computation should resume (Line 9 in the code on the +left-hand side). The instruction \code{ret} then transfers +control back to the function \pcode{main} to the the +instruction just after the call to \pcode{foo}, that is Line +9. Another part of the ``conspiracy'' of buffer overflow attacks is that library functions in C look typically as follows: @@ -297,7 +299,7 @@ buffer, is stored on the stack before the older items, like return address and arguments. If it had be the other way around, then such an overwriting by overflowing a local buffer -would just not work. If the designers of C had just been able +would just not work. Had the designers of C had just been able to foresee what headaches their way of arranging the stack caused in the time where computers are accessible from everywhere. @@ -386,7 +388,7 @@ \noindent These characters represent the machine code for opening a shell. It seems obtaining such a string requires -higher-education in the architecture of the target system. But +``higher-education'' in the architecture of the target system. But it is actually relatively simple: First there are many such string ready-made---just a quick Google query away. Second, tools like the debugger can help us again. We can just write @@ -399,20 +401,21 @@ the machine code, or even the ready-made encoding as character sequence. -While easy, obtaining this string is not entirely trivial. -Remember the functions in C that copy or fill buffers work -such that they copy everything until the zero byte is reached. -Unfortunately the ``vanilla'' output from the debugger for the -shell-program above will contain such zero bytes. So a -post-processing phase is needed to rewrite the machine code in -a way that it does not contain any zero bytes. This is like -some works of literature that have been written so that the -letter e, for example, is avoided. The technical term for such -a literature work is \emph{lipogram}.\footnote{The most -famous example of a lipogram is a 50,000 words novel titled -Gadsby, see \url{https://archive.org/details/Gadsby}.} For -rewriting the machine code, you might need to use clever -tricks like +While easy, obtaining this string is not entirely trivial +using \pcode{gdb}. Remember the functions in C that copy or +fill buffers work such that they copy everything until the +zero byte is reached. Unfortunately the ``vanilla'' output +from the debugger for the shell-program above will contain +such zero bytes. So a post-processing phase is needed to +rewrite the machine code in a way that it does not contain any +zero bytes. This is like some works of literature that have +been written so that the letter e, for example, is avoided. +The technical term for such a literature work is +\emph{lipogram}.\footnote{The most famous example of a +lipogram is a 50,000 words novel titled Gadsby, see +\url{https://archive.org/details/Gadsby}, which avoids the +letter `e' throughout.} For rewriting the +machine code, you might need to use clever tricks like \begin{lstlisting}[numbers=none,language={[x86masm]Assembler}] xor %eax, %eax @@ -485,7 +488,7 @@ amount of time. If we now use an address that lets us jump to any address in the grey area we are done. The target machine will execute these \pcode{NOP} operations until it reaches the -shellcode. A moment of thought can convince you that this +shellcode. A moment of thought should convince you that this trick can hugely improve our odds of finding the right address---depending on the size of the buffer, it might only take a few tries to get the shellcode to run. And then we are @@ -558,32 +561,32 @@ \lstinputlisting[language=C]{../progs/C5.c} -\noindent Here the programmer actually to take extra care to -not fall pray to a buffer overflow attack, but in the process -made the program susceptible to a format string attack. -Clearly the \pcode{printf} function in Line 7 contains now -an explicit format string, but because the commandline -input is copied using the function \pcode{snprintf} the -result will be the same---the string can be exploited -by embedding format strings into the user input. Here the -programmer really cannot be blamed (much) because by using -\pcode{snprintf} he or she tried to make sure only 10 -characters get copied into the local buffer---in this way -avoiding the obvious buffer overflow attack. +\noindent Here the programmer actually tried to take extra +care to not fall pray to a buffer overflow attack, but in the +process made the program susceptible to a format string +attack. Clearly the \pcode{printf} function in Line 7 contains +now an explicit format string, but because the commandline +input is copied using the function \pcode{snprintf} the result +will be the same---the string can be exploited by embedding +format strings into the user input. Here the programmer really +cannot be blamed (much) because by using \pcode{snprintf} he +or she tried to make sure only 10 characters get copied into +the local buffer---in this way avoiding the obvious buffer +overflow attack. \subsubsection*{Caveats and Defences} -How can we defend against these attacks? Well, a reflex could -be to blame programmers. Precautions should be taken so that -buffers cannot been overfilled and format strings should not -be forgotten. This might actually be slightly simpler nowadays -since safe versions of the library functions exists, which -always specify the precise number of bytes that should be -copied. Compilers also nowadays provide warnings when format -strings are omitted. So proper education of programmers is -definitely a part of a defence against such attacks. However, -if we leave it at that, then we have the mess we have today -with new attacks discovered almost daily. +How can we defend against these attacks? Well, a reflex could +be to blame programmers. Precautions should be taken by them +so that buffers cannot been overfilled and format strings +should not be forgotten. This might actually be slightly +simpler nowadays since safe versions of the library functions +exist, which always specify the precise number of bytes that +should be copied. Compilers also nowadays provide warnings +when format strings are omitted. So proper education of +programmers is definitely a part of a defence against such +attacks. However, if we leave it at that, then we have the +mess we have today with new attacks discovered almost daily. There is actually a quite long record of publications proposing defences against buffer overflow attacks. One method @@ -711,9 +714,11 @@ attacks. \bigskip -\noindent If you want to know more about buffer overflow -attacks, the original Phrack article ``Smashing The Stack For -Fun And Profit'' by Elias Levy (also known as Aleph One) is an +\subsubsection*{Further Reading} + +If you want to know more about buffer overflow attacks, the +original Phrack article ``Smashing The Stack For Fun And +Profit'' by Elias Levy (also known as Aleph One) is an engaging read: \begin{center} diff -r 57269d9931da -r 8a42736cce27 handouts/ho04.pdf Binary file handouts/ho04.pdf has changed diff -r 57269d9931da -r 8a42736cce27 handouts/ho04.tex --- a/handouts/ho04.tex Tue Oct 28 16:33:53 2014 +0000 +++ b/handouts/ho04.tex Wed Oct 29 13:08:11 2014 +0000 @@ -285,13 +285,13 @@ \subsubsection*{Multi-Agent Access Control} In military or banking, for example, very critical decisions -need to be made using a \emph{two-men rule}. This means such +need to be made using a \emph{two-people rule}. This means such decisions need to be taken by two people together, so that no single person can defraud a bank or start a nuclear war (you will know what I mean if you have seen the classic movie ``Dr Strangelove or: How I Learned to Stop Worrying and Love the Bomb''\footnote{\url{http://en.wikipedia.org/wiki/Dr._Strangelove}}). -Translating the two-men rule into a software system seems not +Translating the two-people rule into a software system seems not as straightforward as one might think. Let us assume we want to implement a system where CEOs can diff -r 57269d9931da -r 8a42736cce27 handouts/ho05.pdf Binary file handouts/ho05.pdf has changed diff -r 57269d9931da -r 8a42736cce27 handouts/ho05.tex --- a/handouts/ho05.tex Tue Oct 28 16:33:53 2014 +0000 +++ b/handouts/ho05.tex Wed Oct 29 13:08:11 2014 +0000 @@ -7,27 +7,170 @@ \section*{Handout 5 (Protocols)} +Protocols are the computer science equivalent to fractals and +the Mandelbrot set in mathematics. With the latter you have a +simple formula which you just iterate and then you test +whether a point is inside or outside a region, and voila +something magically +happened.\footnote{\url{http://en.wikipedia.org/wiki/Fractal}, +\url{http://en.wikipedia.org/wiki/Mandelbrot_set}} Protocols +are similar: they are simple exchanges of messages, but in the +end something ``magical'' can happen---for example a secret +channel has been established or two entities have +authenticated themselves to each other. The problem with magic +is of course it is poorly understood and even experts often +got, and get, it wrong with protocols. + +To have an idea what kind of protocols we are interested, let +us look at a few examples. One example are (wireless) key +fobs which operate the central locking system and the +ignition in a car. + +\begin{center} +\includegraphics[scale=0.075]{../pics/keyfob.jpg} +\quad +\includegraphics[scale=0.2025]{../pics/startstop.jpg} +\end{center} + +\noindent The point of these key fobs is that everything is +done over the ``air''---there is no physical connection +between the key, doors and engine. So we must achieve security +by exchanging certain messages between the key fob on one side +and doors and engine on the other. Clearly what we like to +achieve is that I can get into my car and start it, but that +thieves are kept out. The problem is that everybody can +``overhear'' or skim the exchange of messages between the key +fob and car. In this scenario the simplest attack you need to +defend against is a person-in-the-middle attack. Imagine you +park your car in front of a supermarket. One thief follows you +with a strong transmitter. A second thief ``listens'' to the +signal from the car and wirelessly transmits it to the +``colleague'' who followed you and who silently enquires about +the answer from the key fob. The answer is then send back to +the thief at the car, which then dutifully opens and possibly +starts. No need to steal your key anymore. + +But there are many more such protocols we like to consider. +Other examples are wifi---you might sit at a Starbucks and +talk wirelessly to the free access point there and from there +talk with your bank, for example. Also even if your have to +touch your Oyster card at the reader each time you enter and +exit the Tube, it actually operates wirelessly and with +appropriate equipment over some quite large distance. But +there are many many more examples (Bitcoins, mobile +phones,\ldots). The common characteristics of the protocols we +are interested in here is that an adversary or attacker is +assumed to be in complete control over the network or channel +over which you exchanging messages. An attacker can install a +packet sniffer on a network, inject packets, modify packets, +replay old messages, or fake pretty much everything. In this +hostile environment, the purpose of protocols (that is +exchange of messages) is to achieve some security goal, for +example only allow the owner of the car in but everybody else +should be kept out. + The protocols we are interested here are generic descriptions of how to exchange messages in order to achieve a goal, be it establishing a mutual secure connection or being able to -authenticate to a system. Our notion of protocol is -deliberately quite general: it includes situations like the -messages send between a key fob and a car in order to open -doors or the messages that participants need to exchange in -order to mine Bitcoins (which is often already called Bitcoin -\emph{protocol}). +authenticate to a system. Unlike the distant past where for +example we had to meet a person in order to authenticate him +or her (via a passport for example), the problem we are facing +on the Internet is that we cannot easily be sure who we are +``talking'' to. The obvious reason is that only some electrons +arrive at our computer; we do not see the person, or computer, +behind the incoming electrons (messages). + +To start, let us look at one of the simplest protocols that +are part of the TCP protocol (which underlies the Internet). +This protocol does not do anything security relevant, it just +establishes a ``hello'' from a client to a server which the +server answers with ``I heard you'' and the client answers +in turn with something like ``thanks''. This protocol +is often called a \emph{three-way handshake}. Graphically it +can be illustrated as follows + +\begin{center} +\includegraphics[scale=0.5]{../pics/handshake.png} +\end{center} + +\noindent On the left-hand side is a client, say Alice, on the +right-hand side is a server, say. Time is running from top to +bottom. Alice initial SYN message needs some time to travel to +the server. The server answers with SYN-ACK, which will +require some time to arrive at Alice. Her answer ACK will +again take some time to arrive at the server. After the +messages are exchanged Alice and the server simply have +established a channel to communicate over. Alice does +not know whether she is really talking to the server (somebody +else on the network might have intercepted her message +and replied in place of the server). Similarly, the +server has no idea who it is talking to. That this can be +established depends on what is exchanged next and is the +point of the protocols we want to study in more detail. + +Before we start in earnest, we need to fix a more +convenient notation for protocols. Drawing pictures like +the one above would be awkward in the long-run. The +notation already abstracts away from a few details we are +not interested in: for example the time the messages +need to travel between endpoints. What we are interested +in is in which order the messages are sent. For the SYN-ACK +protocol we will therefore use the notation -Unlike the distant past where for example we had to meet a -person in order to authenticate him or her (via a passport for -example), the problem we are facing is that on the Internet we -cannot easily be sure who we are ``talking'' to. The obvious -reason is that only some electrons arrive at our computer; we -do not see the person, or computer, behind the incoming -electrons. Often there are is also no person behind the -messages, rather than a computer system. +\begin{center} +\begin{tabular}{l@{\hspace{2mm}}l} +$A \to S$: & $SYN$\\ +$S \to A$: & $SYN\_ACK$\\ +$A \to S$: & $ACK$\\ +\end{tabular} +\end{center} + +\noindent The left-hand side specifies who is the sender and +who is the receiver of the message. On the right of the colon +is the message that is send. The order from top to down +specifies in which order the messages are sent. We also +have the convention that messages like above $SYN$ are send +in clear-text over the network. If we want that a message is +encrypted, then we use the notation + +\[ +\{msg\}_{K_{AB}} +\] + + +\noindent for messages. The curly braces indicate a kind of +envelope which can only be opened if you know the key $K_{AB}$ +with which the message has been encrypted. We always assume +that an attacker, say Eve, cannot get the content of the +message, unless she is also in the possession of the key. We +explicitly exclude in our study that the encryption can be +broken.\footnote{\ldots{}which of course is what a good +protocol designer needs to ensure and more often than not +protocols are broken. For example Oyster cards contain a very +weak encryption mechanism which has been attacked.} It is also +possible that an encrypted message contains several parts. In +this case we would write something like + +\[ +\{msg_1, msg_2\}_{K_{AB}} +\] + +\noindent But again Eve would not be able to know +this unless she also has the key. We also allow the +possibility that a message is encrypted twice under +different keys. In this case we write + +\[ +\{\{msg\}_{K_{AB}}\}_{K_{BC}} +\] +Note, however, +while an attacker cannot obtain the content of the message +without the key, this encrypted message can be observed +and be recorded and then replayed at another time. + Keyfobs - protocol {\small