\documentclass{article}\usepackage{../style}\usepackage{../langs}\usepackage{../graphics}\usepackage{../data}%https://pentestn00b.wordpress.com/safe-to-hack-sites/%http://pastebin.com/raw/0SNSvyjJ%http://seclists.org/oss-sec/2016/q1/645% relative new buffer-overflow attack%https://www.trustwave.com/Resources/SpiderLabs-Blog/How-I-Cracked-a-Keylogger-and-Ended-Up-in-Someone-s-Inbox/% using an exploit from 2010% https://cve.mitre.org/cgi-bin/cvename.cgi?name=cve-2010-3333% https://github.com/cs01/gdbgui/% gdb frontend\begin{document}\fnote{\copyright{} Christian Urban, King's College London, 2014, 2015, 2016, 2017}\section*{Handout 3 (Buffer Overflow Attacks)}\begin{flushright}\begin{minipage}{9cm}\it ``We took a network that was designed to be resilient to nuclear war andwe made it vulnerable to toasters.''\\\mbox{}\hfill\small --- Eben Upton, 2017, RPi co-founder\end{minipage}\end{flushright}\bigskip\noindentBy far the most popular attack method on computers are bufferoverflow attacks or variations thereof. The first Internetworm (Morris) exploited exactly such an attack. The popularityis unfortunate because we nowadays have technology in place toprevent them effectively. But these kind of attacks are stillvery relevant even today since there are many legacy systemsout there and also many modern embedded systems often do nottake any precautions to prevent such attacks. The plot belowshows the percentage of buffer overflow attacks listed in theUS National Vulnerability Database.\footnote{Search for``Buffer errors'' in the advanced serach tab at\url{http://web.nvd.nist.gov/view/vuln/statistics}.}\begin{center}\begin{tikzpicture}\begin{axis}[ xlabel={year}, ylabel={\% of total attacks}, ylabel style={yshift=-1em}, enlargelimits=false, xtick={1997,1999,2001,...,2017}, xmin=1996.5, xmax=2018, ymax=21, ytick={0,5,...,20}, scaled ticks=false, axis lines=left, width=12cm, height=5cm, ybar, nodes near coords= {\footnotesize $\pgfmathprintnumber[fixed,fixed zerofill,precision=1,use comma]{\pgfkeysvalueof{/data point/y}}$}, x tick label style={font=\footnotesize,/pgf/number format/1000 sep={}}]\addplot table [x=Year,y=Percentage] {bufferoverflows.data};\end{axis}\end{tikzpicture}\end{center}\noindent This statistics shows that in the last seven years or so thenumber of buffer overflow attacks is around 10\% of all attacks(whereby the absolute numbers of attacks grow each year). So you cansee buffer overflow attacks are very relevant today. For example, veryrecently (February 2016) a buffer overflow attack was discovered in the glibclibrary:\footnote{See \url{http://goo.gl/De2mA8}}\begin{quote}\it``Since 2008, a vulnerability has left apps and hardware open to remote hijacking: Researchers have discovered a potentially catastrophic flaw in one of the Internet's core building blocks that leaves hundreds or thousands of apps and hardware devices vulnerable to attacks that can take complete control over them. The vulnerability was introduced in 2008 in GNU C Library, a collection of open source code that powers thousands of standalone applications and most distributions of Linux, including those distributed with routers and other types of hardware. A function known as getaddrinfo() that performs domain-name lookups contains a buffer overflow bug that allows attackers to remotely execute malicious code. It can be exploited when vulnerable devices or apps make queries to attacker-controlled domain names or domain name servers or when they're exposed to man-in-the-middle attacks where the adversary has the ability to monitor and manipulate data passing between a vulnerable device and the open Internet. All versions of glibc after 2.9 are vulnerable.''\end{quote}To understand how buffer overflow attacks work, we have to havea look at how computers work ``under the hood'' (on themachine level) and also understand some aspects of the C/C++programming language. This might not be everyday fare forcomputer science students, but who said that criminal hackersrestrict themselves to everyday fare? ...not to mention thefree-riding script-kiddies who use this technology withouteven knowing what the underlying ideas are. If you want to bea good security engineer who needs to defend against such attacks, then better you get to know the details too.For buffer overflow attacks to work, a number of innocentdesign decisions, which are really benign on their own, haveto come together. All these decisions were taken at a timewhen there was no Internet: C was introduced around 1973; theInternet TCP/IP protocol was standardised in 1982 by whichtime there were maybe 500 servers connected (and all userswere well-behaved, mostly academics); Intel's first 8086 CPUsarrived around 1977. So nobody of the ``forefathers'' canreally be blamed, but as mentioned above we should already beway beyond the point that buffer overflow attacks are worth athought. Unfortunately, this is far from the reality. I let youponder why?One such ``benign'' design decision is how the memory is laidout into different regions for each process. \begin{center} \begin{tikzpicture}[scale=0.7] %\draw[step=1cm] (-3,-3) grid (3,3); \draw[line width=1mm] (-2, -3) rectangle (2,3); \draw[line width=1mm] (-2,1) -- (2,1); \draw[line width=1mm] (-2,-1) -- (2,-1); \draw (0,2) node {\large\tt text}; \draw (0,0) node {\large\tt heap}; \draw (0,-2) node {\large\tt stack}; \draw (-2.7,3) node[anchor=north east] {\tt\begin{tabular}{@{}l@{}}lower\\ address\end{tabular}}; \draw (-2.7,-3) node[anchor=south east] {\tt\begin{tabular}{@{}l@{}}higher\\ address\end{tabular}}; \draw[->, line width=1mm] (-2.5,3) -- (-2.5,-3); \draw (2.7,-2) node[anchor=west] {\tt grows}; \draw (2.7,-3) node[anchor=south west] {\tt\footnotesize older}; \draw (2.7,-1) node[anchor=north west] {\tt\footnotesize newer}; \draw[|->, line width=1mm] (2.5,-3) -- (2.5,-1); \end{tikzpicture}\end{center}\noindent The text region contains the program code (usuallythis region is read-only). The heap stores all data theprogrammer explicitly allocates. For us the most interestingregion is the stack, which contains data mostly associatedwith the control flow of the program. Notice that the stackgrows from higher addresses to lower addresses (i.e.~from theback to the front). That means that older items on the stackare stored behind, or after, newer items. Let's look a bitcloser what happens with the stack when a program is running.Consider the following simple C program.\begin{minipage}{\textwidth}\lstinputlisting[language=C]{../progs/example1.c} \end{minipage}\noindent The \code{main} function calls in Line 7 thefunction \code{foo} with three arguments. \code{Foo} createstwo (local) buffers, but does not do anything interesting withthem. The only purpose of this program is to illustrate whathappens behind the scenes with the stack. The interestingquestion is what will the stack look like after Line 3 hasbeen executed? The answer is illustrated in Figure~\ref{stack}.\begin{figure} \begin{center} \begin{tikzpicture}[scale=0.65] \draw[gray!20,fill=gray!20] (-5, 0) rectangle (-3,-1); \draw[line width=1mm] (-5,-1.2) -- (-5,0.2); \draw[line width=1mm] (-3,-1.2) -- (-3,0.2); \draw (-4,-1) node[anchor=south] {\tt main}; \draw[line width=1mm] (-5,0) -- (-3,0); \draw[gray!20,fill=gray!20] (3, 0) rectangle (5,-1); \draw[line width=1mm] (3,-1.2) -- (3,0.2); \draw[line width=1mm] (5,-1.2) -- (5,0.2); \draw (4,-1) node[anchor=south] {\tt main}; \draw[line width=1mm] (3,0) -- (5,0); %\draw[step=1cm] (-3,-1) grid (3,8); \draw[gray!20,fill=gray!20] (-1, 0) rectangle (1,-1); \draw[line width=1mm] (-1,-1.2) -- (-1,7.4); \draw[line width=1mm] ( 1,-1.2) -- ( 1,7.4); \draw (0,-1) node[anchor=south] {\tt main}; \draw[line width=1mm] (-1,0) -- (1,0); \draw (0,0) node[anchor=south] {\tt arg$_3$=3}; \draw[line width=1mm] (-1,1) -- (1,1); \draw (0,1) node[anchor=south] {\tt arg$_2$=2}; \draw[line width=1mm] (-1,2) -- (1,2); \draw (0,2) node[anchor=south] {\tt arg$_1$=1}; \draw[line width=1mm] (-1,3) -- (1,3); \draw (0,3.1) node[anchor=south] {\tt ret}; \draw[line width=1mm] (-1,4) -- (1,4); \draw (0,4) node[anchor=south] {\small\tt last sp}; \draw[line width=1mm] (-1,5) -- (1,5); \draw (0,5) node[anchor=south] {\tt buf$_1$}; \draw[line width=1mm] (-1,6) -- (1,6); \draw (0,6) node[anchor=south] {\tt buf$_2$}; \draw[line width=1mm] (-1,7) -- (1,7); \draw[->,line width=0.5mm] (1,4.5) -- (1.8,4.5) -- (1.8, 0) -- (1.1,0); \draw[->,line width=0.5mm] (1,3.5) -- (2.5,3.5); \draw (2.6,3.1) node[anchor=south west] {\tt back to main()};\end{tikzpicture}\end{center}\caption{The stack layout for a program where the mainfunction calls an auxiliary function with three arguments(1,2 and 3). The auxiliary function has two localbuffer variables {\tt buf}$_1$ and {\tt buf}$_2$.\label{stack}} \end{figure}On the left is the stack before \code{foo} is called; on theright is the stack after \code{foo} finishes. The before andafter of the stack looks the same.The functioncall to \code{foo} in Line 7 (in the C program above) pushesthe arguments onto the stack in reverse order---shown in themiddle. Therefore first 3 then 2 and finally 1. Then it pushesthe return address onto the stack where execution shouldresume once \code{foo} has finished. The last stack pointer(\code{sp}) is needed in order to clean up the stack to thelast level---in fact there is no cleaning involved, but justthe top of the stack will be set back to this address. So thelast stack pointer also needs to be stored. The two buffersinside \pcode{foo} are on the stack too, because they arelocal data within \code{foo}. Consequently the stack in themiddle of Figure~\ref{stack} is a snapshot after Line 3 hasbeen executed. In case you are familiar with assembly instructions you canalso read off this behaviour from the machine code that the\code{gcc} compiler generates for the programabove:\footnote{You can make \pcode{gcc} generate assemblyinstructions if you call it with the \pcode{-S} option, forexample \pcode{gcc -S out in.c}\;. Or you can look at thiscode by using the debugger. How to do this will be explainedin the last section.} It generates the following code for the\pcode{main} and \pcode{foo} functions.\begin{center}\small\begin{tabular}[t]{p{11cm}}{\lstinputlisting[language={[x86masm]Assembler}, morekeywords={movl},xleftmargin=5mm] {../progs/example1a.s}}\end{tabular}\end{center}\noindent Again you can see how the function \pcode{main}prepares in Lines 2 to 7 the stack before calling the function\pcode{foo}. You can see that the numbers 3, 2, 1 are storedon the stack (the register \code{\%esp} refers to the top ofthe stack; \pcode{$0x1}, \pcode{$0x2} \pcode{$0x3} are thehexadecimal encodings for \pcode{1} to \pcode{3}). The codefor the foo function is as follows:\begin{center}\small\begin{tabular}[t]{p{11cm}}{\lstinputlisting[language={[x86masm]Assembler}, morekeywords={movl,movw},xleftmargin=5mm] {../progs/example1b.s}} \end{tabular}\end{center}\noindent The code for function \pcode{foo} storesfirst the last stack pointer onto the stack and thencalculates the new stack pointer to have enough space for thetwo local buffers (Lines 2 - 4). Then it puts the two localbuffers onto the stack and initialises them with the givendata (Lines 5 to 9). Since there is no real computation goingon inside \pcode{foo}, the function then just restores thestack to its old state (Line 10) and crucially sets the returnaddress where the computation should resume (Line 10). Theinstruction \code{ret} then transfers control back to thefunction \pcode{main} to the instruction just after the callto \pcode{foo}, that is Line 10.Another part of the ``conspiracy'' of buffer overflow attacksis that library functions in C look typically as follows:\begin{center}\lstinputlisting[language=C,numbers=none]{../progs/app5.c}\end{center} \noindent This function copies data from a source \pcode{src}to a destination \pcode{dst}. The important point is that itcopies the data until it reaches a zero-byte (\code{"\\0"}). This is a convention of the C language which assumes allstrings are terminated by such a zero-byte.The central idea of the buffer overflow attack is to overwritethe return address on the stack. This address decides wherethe control flow of the program should resume once thefunction at hand has finished its computation. So if we can control this address, then we can modify the controlflow of a program. To launch an attack we need somewhere in a function a local a buffer, say\begin{center}\code{char buf[8];}\end{center}\noindent which is filled by some user input. Thecorresponding stack of such a function will look as follows\begin{center} \begin{tikzpicture}[scale=0.65] %\draw[step=1cm] (-3,-1) grid (3,8); \draw[line width=1mm] (-1,1.2) -- (-1,6.4); \draw[line width=1mm] ( 1,1.2) -- ( 1,6.4); \draw (0,2) node[anchor=south] {\ldots}; \draw[line width=1mm] (-1,3) -- (1,3); \draw (0,3.1) node[anchor=south] {\tt ret}; \draw[line width=1mm] (-1,4) -- (1,4); \draw (0,4) node[anchor=south] {\small\tt last sp}; \draw[line width=1mm] (-1,5) -- (1,5); \draw (0,5.1) node[anchor=south] {\tt buf}; \draw[line width=1mm] (-1,6) -- (1,6); \draw (2,5.1) node[anchor=south] {\code{$esp}}; \draw[<-,line width=0.5mm] (1.1,6) -- (2.5,6); \draw[->,line width=0.5mm] (1,4.5) -- (2.5,4.5); \draw (2.6,4.1) node[anchor=south west] {\code{??}}; \draw[->,line width=0.5mm] (1,3.5) -- (2.5,3.5); \draw (2.6,3.1) node[anchor=south west] {\tt jump to \code{\\x080483f4}};\end{tikzpicture}\end{center}\noindent We need to fill this buffer over its limit of 8characters so that it overwrites the stack pointer and thenalso overwrites the return address. If, for example, we wantto jump to a specific address in memory, say,\pcode{\\x080483f4} then we can fill the buffer with the data\begin{center}\code{char buf[8] = "AAAAAAAABBBB\\xf4\\x83\\x04\\x08";}\end{center}\noindent The first eight \pcode{A}s fill the buffer to therim; the next four \pcode{B}s overwrite the stack pointer(with what data we overwrite this part is usually notimportant); then comes the address we want to jump to. Noticethat we have to give the address in the reverse order. Alladdresses on Intel CPUs need to be given in this way. Sincethe string is enclosed in double quotes, the C convention isthat the string internally will automatically be terminated bya zero-byte. If the programmer uses functions like\pcode{strcpy} for filling the buffer \pcode{buf}, then we canbe sure it will overwrite the stack in this manner---since itwill copy everything up to the zero-byte. Notice that thisoverwriting of the buffer only works since the neweritem---the buffer---is stored on the stack before the olderitems, like return address and arguments. If it had be theother way around, then such an overwriting by overflowing alocal buffer would just not work. Had the designers of C been able to foresee what headaches their way ofarranging the stack will cause, how different could bethe IT-World today?What the outcome of such an attack is can be illustrated withthe code shown in Figure~\ref{C2}. Under ``normal operation''this program ask for a login-name and a password. Both ofwhich are stored in \code{char} buffers of length 8. Thefunction \pcode{match} tests whether two such buffers containthe same content. If yes, then the function lets you ``in''(by printing \pcode{Welcome}). If not, it denies access (byprinting \pcode{Wrong identity}). The vulnerable function is\code{get_line} in Lines 11 to 19. This function does not takeany precautions about the buffer of 8 characters being filledbeyond its 8-character-limit. Let us suppose the login name is\pcode{test}. Then the buffer overflow can be triggered with aspecially crafted string as password (remember\pcode{get\_line} requires a \pcode{\\n} at the end of theinput):\begin{center}\code{AAAAAAAABBBB\\x2c\\x85\\x04\\x08\\n}\end{center}\noindent The address at the end happens to be the one for thefunction \pcode{welcome()}. This means even with this input(where the login name and password clearly do not match) theprogram will still print out \pcode{Welcome}. The onlyinformation we need for this attack to work is to know wherethe function \pcode{welcome()} starts in memory. Thisinformation can be easily obtained by starting the programinside the debugger and disassembling this function. \begin{lstlisting}[numbers=none,language={[x86masm]Assembler}, morekeywords={movl,movw}]$ gdb C2GNU gdb (GDB) 7.2-ubuntu(gdb) disassemble welcome\end{lstlisting}\noindent \pcode{C2} is the name of the program and\pcode{gdb} is the name of the debugger. The output will besomething like this\begin{lstlisting}[numbers=none,language={[x86masm]Assembler}, morekeywords={movl,movw}]0x0804852c <+0>: push %ebp0x0804852d <+1>: mov %esp,%ebp0x0804852f <+3>: sub $0x4,%esp0x08048532 <+6>: movl $0x8048690,(%esp)0x08048539 <+13>: call 0x80483a4 <puts@plt>0x0804853e <+18>: movl $0x0,(%esp)0x08048545 <+25>: call 0x80483b4 <exit@plt>\end{lstlisting}\noindent indicating that the function \pcode{welcome()}starts at address \pcode{0x0804852c} (top address in the left column).\begin{figure}[p]\lstinputlisting[language=C]{../progs/C2.c}\caption{A vulnerable login implementation. The use of the `own' \code{get\_line} function makes this program vulnerable. The developer should have used \emph{safe} library functions instead.\label{C2}}\end{figure}This kind of attack was very popular with commercial programsthat needed a key to be unlocked. Historically, hackers first broke the rather weak encryption of these locking mechanisms.After the encryption had been made stronger, hackers usedbuffer overflow attacks as shown above to jump directly tothe part of the program that was intended to be only availableafter the correct key was typed in. \subsection*{Payloads}Unfortunately, much more harm can be caused by buffer overflowattacks. This is achieved by injecting code that will be runonce the return address is appropriately modified. Typicallythe code that will be injected starts a shell. This gives theattacker the ability to run programs on the target machine andto have a good look around in order to obtain also full rootaccess (normally the program that is attacked would run withlesser rights and any shell injected would also only run withthese lesser access rights). If the attacked program was already running as root, then the attacker can congratulatehim or herself to another computer under full control\ldotsno more work to be done.In order to be send as part of the string that is overflowingthe buffer, we need the code for starting the shell to berepresented as a sequence of characters. For example\lstinputlisting[language=C,numbers=none]{../progs/o1.c}\noindent These characters represent the machine code foropening a shell. It seems obtaining such a string requires``higher-education'' in the architecture of the target system.But it is actually relatively simple: First there are manysuch strings ready-made---just a quick Google query away.A nice selection of ready-made shell-codes can be found for example at\begin{center}\url{http://shellblade.net/shellcode.html}\end{center}Second, tools like the debugger can help us again. We can justwrite the code we want in C, for example this would be theprogram for starting a shell:\lstinputlisting[language=C,numbers=none]{../progs/shell.c} \noindent Once compiled, we can use the debugger to obtain themachine code, or even get the ready-made encoding as charactersequence. \lstinputlisting[language=C,numbers=none]{../progs/o2.c}\noindentWhile not too difficult, obtaining this string is not entirelytrivial using \pcode{gdb}. Remember the functions in C thatcopy or fill buffers work such that they copy everything untilthe zero byte is reached. Unfortunately the ``vanilla'' outputfrom the debugger for the shell-program above contains suchzero bytes. So a post-processing phase is needed to rewritethe machine code in a way that it does not contain any zerobytes. This is like some works of literature that have beenwritten so that the letter e, for example, is avoided. Thetechnical term for such a literature work is\emph{lipogram}.\footnote{The most famous example of alipogram is a 50,000 words novel titled Gadsby, see\url{https://archive.org/details/Gadsby}, which avoids theletter `e' throughout.} For rewriting the machine code, youmight need to use clever tricks like\begin{lstlisting}[numbers=none,language={[x86masm]Assembler}]xor %eax, %eax\end{lstlisting}\noindent This instruction does not contain any zero-byte whenencoded as string, but produces a zero-byte on the stack whenrun. Having removed the zero-bytes we can craft the string thatwill be send to the target computer. This of course requiresthat the buffer we are trying to attack can at least containthe shellcode we want to run. But as you can see this is only47 bytes, which is a very low bar to jump over. Actually thereare optimised versions which only need 24 bytes. Moreformidable is the problem of finding the right address to jumpto. The string is typically of the form\begin{center} \begin{tikzpicture}[scale=0.6] \draw[line width=1mm] (-2, -1) rectangle (2,3); \draw[line width=1mm] (-2,1.9) -- (2,1.9); \draw (0,2.5) node {\large\tt shell code}; \draw[line width=1mm,fill=black] (0.3, -1) rectangle (2,-0.7); \draw[->,line width=0.3mm] (1.05, -1) -- (1.05,-1.7) -- (-3,-1.7) -- (-3, 3.7) -- (-1.9, 3.7) -- (-1.9, 3.1); \draw (-2, 3) node[anchor=north east] {\LARGE \color{codegreen}{``}}; \draw ( 2,-0.9) node[anchor=west] {\LARGE\color{codegreen}{''}}; \end{tikzpicture}\end{center}\noindent where we need to be very precise with the addresswith which we will overwrite the buffer (indicated as a blackrectangle). It has to be precisely the first byte of theshellcode. While this is easy with the help of a debugger (asseen before), we typically cannot run anything, including adebugger, on the machine yet we target. And the address isvery specific to the setup of the target machine. One way offinding out what the right address is is to try out one-by-oneevery possible address until we get lucky. With the largememories available today, however, the odds are long for this. And ifwe try out too many possible candidates too quickly, we mightbe detected by the system administrator of the target system.We can improve our odds considerably by making use of a veryclever trick. Instead of adding the shellcode at the beginningof the string, we should add it at the end, just before weoverflow the buffer, for example\begin{center} \begin{tikzpicture}[scale=0.6] \draw[gray!50,fill=gray!50] (-2,0.3) rectangle (2,3); \draw[line width=1mm] (-2, -1) rectangle (2,3); \draw[line width=1mm] (-2,0.3) -- (2,0.3); \draw[line width=1mm] (-2,-0.7) -- (2,-0.7); \draw (0,-0.2) node {\large\tt shell code}; \draw[line width=1mm,fill=black] (0.3, -1) rectangle (2,-0.7); \draw [line width=0.5,decoration={brace,amplitude=2mm},decorate] (2.3,3) -- (2.3,0.3); \draw[line width=0.3mm] (1.05, -1) -- (1.05,-1.7) -- (3,-1.7) -- (3,1.65) -- (2.6, 1.65); \draw (-2, 3) node[anchor=north east] {\LARGE \color{codegreen}{``}}; \draw ( 2,-0.9) node[anchor=west] {\LARGE\color{codegreen}{''}}; \end{tikzpicture}\end{center}\noindent Then we can fill up the grey part of the string with\pcode{NOP} operations. The code for this operation is\code{\\0x90} on Intel CPUs. It is available on everyarchitecture and its purpose in a CPU is to do nothing apartfrom waiting a small amount of time. If we now use an addressthat lets us jump to any address in the grey area we are done.The target machine will execute these \pcode{NOP} operationsuntil it reaches the shellcode. That is why this NOP-part isoften called \emph{NOP-sledge}. A moment of thought shouldconvince you that this trick can hugely improve our odds offinding the right address---depending on the size of thebuffer, it might only take a few tries to get the shellcode torun. And then we are in. The code for such an attack is shownin Figure~\ref{C3}. It is directly taken from the originalpaper about ``Smashing the Stack for Fun and Profit'' (seepointer given at the end).\begin{figure}[p]\lstinputlisting[language=C]{../progs/C3.c}\caption{Overwriting a buffer with a string containing a payload. Lines 14 and 15 write the address of the buffer into \code{large\_string}. The payload is copied in Lines 17 and 18. Line 20 copies the (too large) string into the buffer.\label{C3}}\end{figure}By the way you might now have the question how do attackersfind out about vulnerable systems in the first place? Well,the automated version uses \emph{fuzzers}, which throwrandomly generated user input at applications and observe thebehaviour. If an application segfaults (throws a segmentationerror) then this is a good indication that a buffer overflowvulnerability can be exploited.\subsubsection*{Format String Attacks}Another question might arise, where do we get all thisinformation about addresses necessary for mounting a bufferoverflow attack without having yet access to the system? Theanswer are \emph{format string attacks}. While technicallythey are programming mistakes (and they are pointed out aswarning by modern compilers), they can be easily made andtherefore an easy target. Let us look at the simplest versionof a vulnerable program.\lstinputlisting[language=C]{../progs/C4.c}\noindent The intention of this program is to print out thefirst argument given on the command line. The ``secretstring'' is never to be printed. The problem is that the Cfunction \pcode{printf} normally expects a format string---aschema that directs how a string should be printed. This wouldbe for example a proper invocation of this function:\begin{lstlisting}[numbers=none,language=C]long n = 123456789;printf("This is a long %lu!", n);\end{lstlisting}\noindent In the program above, instead, the format stringhas been forgotten and only \pcode{argv[1]} is printed.Now if we give on the command line a string such as\begin{center}\code{"foo \%s"}\end{center}\noindent then \pcode{printf} expects a string to follow. But there is no string that follows, and howthe argument resolution works in C will in fact print out the secret string! This can be handily exploited by using the format string \code{"\%x"}, which reads out the stack. So \code{"\%x....\%x"} will give you as much information from the stack as you need and over the Internet.While the program above contains clearly a programming mistake (forgotten format string), things are not as simplewhen the application reads data from the user and promptsresponses containing the user input. Consider the slightvariant of the program above\lstinputlisting[language=C]{../progs/C5.c}\noindent Here the programmer actually tried to take extracare to not fall prey to a buffer overflow attack, but in theprocess made the program susceptible to a format stringattack. Clearly the \pcode{printf} function in Line 7 containsnow an explicit format string, but because the commandlineinput is copied using the function \pcode{snprintf} the resultwill be the same---the string can be exploited by embeddingformat strings into the user input. Here the programmer reallycannot be blamed (much) because by using \pcode{snprintf} heor she tried to make sure only 10 characters get copied intothe local buffer---in this way avoiding the obvious bufferoverflow attack.\subsubsection*{Caveats and Defences}How can we defend against these attacks? Well, a reflex couldbe to blame programmers. Precautions should be taken by themso that buffers cannot be overfilled and format stringsshould not be forgotten. This might actually be slightlysimpler to achieve by programmers nowadays since safe versionsof the library functions exist, which always specify theprecise number of bytes that should be copied. Compilers alsonowadays provide warnings when format strings are omitted. Soproper education of programmers is definitely a part of adefence against such attacks. However, if we leave it at that,then we have the mess we have today with new attacksdiscovered almost daily. There is actually a quite long record of publicationsproposing defences against buffer overflow attacks. One methodis to declare the stack data as not executable. In this way itis impossible to inject a payload as shown above which is thenexecuted once the stack is smashed. But this needs hardwaresupport which allows one to declare certain memory regions tobe not executable. Such a feature was not introduced beforethe Intel 386, for example. Also if you have a JIT(just-in-time) compiler it might be advantageous to havethe stack containing executable data. So it is definitely a trade-off.Anyway attackers have found ways around this defence: theydeveloped \emph{return-to-lib-C} attacks. The idea is to notinject code, but already use the code that is present at thetarget computer. The lib-C library, for example, alreadycontains the code for spawning a shell. With\emph{return-to-lib-C} one just has to find out where thiscode is located. But attackers can make good guesses. Another defence is called \emph{stack canaries}. The advantageis that they can be automatically inserted into compiled codeand do not need any hardware support. Though they will makeyour program run slightly slower. The idea behind \emph{stackcanaries} is to push a random number onto the stack justbefore local data is stored. For our very first function\pcode{foo} the stack would with a \emph{stack canary} look asfollows\begin{center}\begin{tikzpicture}[scale=0.65] %\draw[step=1cm] (-3,-1) grid (3,8); \draw[gray!20,fill=gray!20] (-1, 0) rectangle (1,-1); \draw[line width=1mm] (-1,-1.2) -- (-1,7.4); \draw[line width=1mm] ( 1,-1.2) -- ( 1,7.4); \draw (0,-1) node[anchor=south] {\tt main}; \draw[line width=1mm] (-1,0) -- (1,0); \draw (0,0) node[anchor=south] {\tt arg$_3$=3}; \draw[line width=1mm] (-1,1) -- (1,1); \draw (0,1) node[anchor=south] {\tt arg$_2$=2}; \draw[line width=1mm] (-1,2) -- (1,2); \draw (0,2) node[anchor=south] {\tt arg$_1$=1}; \draw[line width=1mm] (-1,3) -- (1,3); \draw (0,3.1) node[anchor=south] {\tt ret}; \draw[line width=1mm] (-1,4) -- (1,4); \draw (0,4) node[anchor=south] {\small\tt last sp}; \draw[line width=1mm] (-1,5) -- (1,5); \draw (0,5.1) node[anchor=south] {\tt\small\textcolor{red}{\textbf{random}}}; \draw[line width=1mm] (-1,6) -- (1,6); \draw (0,6) node[anchor=south] {\tt buf}; \draw[line width=1mm] (-1,7) -- (1,7); \end{tikzpicture}\end{center}\noindent The idea behind this random number is that when thefunction finishes, it is checked that this random number isstill intact on the stack. If not, then a buffer overflow hasoccurred. Although this is quite effective, it requires suitable support for generating random numbers. This is alwayshard to get right and attackers are happy to exploit the resulting weaknesses.Another defence is \emph{address space randomisation}. Thisdefence tries to make it harder for an attacker to guess addresses where code is stored. It turns out that addresseswhere code is stored is rather predictable. Randomising theplace where programs are stored mitigates this problem somewhat.As mentioned before, modern operating systems have thesedefences enabled by default and make buffer overflow attacksharder, but not impossible. Indeed, I---as an amateurattacker---had to explicitly switch off these defences. A real attacker would be more knowledgeable and not need thisshortcut.To explain BoAs, I run my examples under an Ubuntu version ``MaverickMeerkat'' from October 2010 and the gcc 4.4.5. I have nottried whether newer versions would work as well. I tested allexamples inside a virtualbox\footnote{\url{https://www.virtualbox.org}} insulating mymain system from any harm. When compiling the programs Icalled the compiler with the following options:\begin{center}\begin{tabular}{l@{\hspace{1mm}}l}\pcode{/usr/bin/gcc} & \pcode{-ggdb -O0}\\ & \pcode{-fno-stack-protector}\\ & \pcode{-mpreferred-stack-boundary=2}\\ & \pcode{-z execstack} \end{tabular}\end{center}\noindent The first two options are innocent as they instruct thecompiler to include debugging information and also producenon-optimised code (the latter makes the output of the code abit more predictable). The third is important as it switchesoff defences like the stack canaries. The fourth again makesit a bit easier to read the code. The final option makes thestack executable, thus the example in Figure~\ref{C3} works asintended. While this might be considered as a complete cheat....since Iexplicitly switched off all defences; I hope I was able conveythe point that this is actually not too far from realisticscenarios. I have shown you the classic version of the bufferoverflow attacks. Updated and more advanced variants do exist.With the standard defences switched on, you might want toargue buffer-overflow attacks have been solved on computers(desktops and servers) but the computing landscape of today ismuch wider than that. The main problem today are embeddedsystems against which attacker can equally cause a lot of harmand which are much less defended. Anthony Bonkoski makes asimilar argument in his security blog:\begin{center}\url{http://jabsoft.io/2013/09/25/are-buffer-overflows-solved-yet-a-historical-tale/}\end{center}There is one more rather effective defence against buffer overflow attacks: Why not using a safe language? Java at its inception was touted as a safe language because it hidesall explicit memory management from the user. This definitelyincurs a runtime penalty, but for bog-standard user-inputprocessing applications, speed is not of such an essence anymore. There are of course also many other programming languages that are safe, i.e.~immune to buffer overflowattacks.\bigskip\subsubsection*{Further Reading}If you want to know more about buffer overflow attacks, theoriginal Phrack article ``Smashing The Stack For Fun AndProfit'' by Elias Levy (also known as Aleph One) is anengaging read:\begin{center}\url{http://phrack.org/issues/49/14.html}\end{center} \noindent This is an article from 1996 and some parts arenot up-to-date anymore. The article called``Smashing the Stack in 2010''\begin{center}\url{http://www.mgraziano.info/docs/stsi2010.pdf}\end{center}\noindent updates, as the name says, most information to 2010.There is another Phrack article about return-into-lib(c) exploits from 2012:\begin{center}\url{http://phrack.org/issues/58/4.html}\end{center}\noindentThe main topic is about getting around the non-executability of stackdata (in case it is protected). This article gives some furtherpointers into the recent literature about buffer overflow attacks.Buffer overflow attacks are not just restricted to Linux and ``normal'' computers. There is a book\begin{quote}\rm ``iOS Hacker's Handbook'' by Miller et al, Wiley, 2012\end{quote}\noindentwhich seem to describe buffer overflow attacks on iOS. A book from thesame publisher exists also for Android (from 2014) which seem to alsofeature buffer overflow attacks. Alas I do not own copies of thesebooks.\subsubsection*{A Crash-Course for GDB}If you want to try out the examples from KEATS it might behelpful to know about the following commands of the GNU Debugger:\begin{itemize}\item \texttt{(l)ist n} -- lists the source file from line \texttt{n}, the number can be omitted \item \texttt{disassemble fun-name} -- shows the assembly code of a function\item \texttt{info registers} -- prints out the current content of all registers\item \texttt{run args} -- starts the program, potential arguments can be given\item \texttt{(b)reak line-number} -- sets break point\item \texttt{(c)ontinue} -- continue execution until next breakpoint\item \texttt{x/nxw addr} -- prints out \texttt{n} words starting from address \pcode{addr}, the address could be \code{$esp} for looking at the content of the stack\item \texttt{x/nxb addr} -- prints out \texttt{n} bytes \item \texttt{q} -- quits the debugger\item \texttt{tui enable} or \texttt{C-x C-a} --- switches intoTUI mode \end{itemize}\end{document}%%% Local Variables: %%% mode: latex%%% TeX-master: t%%% End: