\documentclass{article}\usepackage{../style}\usepackage{../langs}\lstset{language=JavaScript}\begin{document}\section*{Handout 1 (Security Engineering)}Much of the material and inspiration in this module is takenfrom the works of Bruce Schneier, Ross Anderson and AlexHalderman. I think they are the world experts in the area ofsecurity engineering. I especially like that they argue that asecurity engineer requires a certain \emph{security mindset}.Bruce Schneier for example writes:\begin{quote} \it ``Security engineers --- at least the good ones --- seethe world differently. They can't walk into a store withoutnoticing how they might shoplift. They can't use a computerwithout wondering about the security vulnerabilities. Theycan't vote without trying to figure out how to vote twice.They just can't help it.''\end{quote}\begin{quote}\it ``Security engineering\ldots requires you to thinkdifferently. You need to figure out not how something works,but how something can be made to not work. You have to imaginean intelligent and malicious adversary inside your system\ldots, constantly trying new ways tosubvert it. You have to consider all the ways your system canfail, most of them having nothing to do with the designitself. You have to look at everything backwards, upside down,and sideways. You have to think like an alien.''\end{quote}\noindent In this module I like to teach you this securitymindset. This might be a mindset that you think is veryforeign to you---after all we are all good citizens and nothack into things. I beg to differ: You have this mindsetalready when in school you were thinking, at leasthypothetically, about ways in which you can cheat in an exam(whether it is about hiding notes or looking over theshoulders of your fellow pupils). Right? To defend a system,you need to have this kind mindset and be able to think likean attacker. This will include understanding techniques thatcan be used to compromise security and privacy in systems.This will many times result in insights where well-intendedsecurity mechanisms made a system actually lesssecure.\medskip\noindent {\Large\bf Warning!} However, don’t be evil! Using thosetechniques in the real world may violate the law or King’srules, and it may be unethical. Under some circumstances, evenprobing for weaknesses of a system may result in severepenalties, up to and including expulsion, fines andjail time. Acting lawfully and ethically is yourresponsibility. Ethics requires you to refrain from doingharm. Always respect privacy and rights of others. Do nottamper with any of King's systems. If you try out a technique,always make doubly sure you are working in a safe environmentso that you cannot cause any harm, not even accidentally.Don't be evil. Be an ethical hacker.\medskip\noindent In this lecture I want to make you familiar with thesecurity mindset and dispel the myth that encryption is theanswer to all security problems (it is certainly often a partof an answer, but almost always never a sufficient one). Thisis actually an important thread going through the wholecourse: We will assume that encryption works perfectly, butstill attack ``things''. By ``works perfectly'' we mean thatwe will assume encryption is a black box and, for example,will not look at the underlying mathematics and break thealgorithms.\footnote{Though fascinating this might be.}For a secure system, it seems, four requirements need to cometogether: First a security policy (what is supposed to beachieved?); second a mechanism (cipher, access controls,tamper resistance etc); third the assurance we obtain from themechanism (the amount of reliance we can put on the mechanism)and finally the incentives (the motive that the peopleguarding and maintaining the system have to do their jobproperly, and also the motive that the attackers have to tryto defeat your policy). The last point is often overlooked,but plays an important role. To illustrate this lets look atan example. \subsubsection*{Chip-and-PIN is Surely More Secure?}The questions is whether the Chip-and-PIN system used withmodern credit cards is more secure than the older method ofsigning receipts at the till. On first glance the answer seemsobvious: Chip-and-PIN must be more secure and indeed improvedsecurity was the central plank in the ``marketing speak'' ofthe banks behind Chip-and-PIN. The earlier system was based ona magnetic stripe or a mechanical imprint on the cards andrequired customers to sign receipts at the till whenever theybought something. This signature authorised the transactions.Although in use for a long time, this system had some crucialsecurity flaws, including making clones of credit cards andforging signatures. Chip-and-PIN, as the name suggests, relies on data beingstored on a chip on the card and a PIN number forauthorisation. Even though the banks involved trumpeted theirsystem as being absolutely secure and indeed fraud ratesinitially went down, security researchers were not convinced(especially the group around Ross Anderson). To begin with,the Chip-and-PIN system introduced a ``new player'' into thesystem that needed to be trusted: the PIN terminals and theirmanufacturers. It was claimed that these terminals weretamper-resistant, but needless to say this was a weak link inthe system, which criminals successfully attacked. Someterminals were even so skilfully manipulated that theytransmitted skimmed PIN numbers via built-in mobile phoneconnections. To mitigate this flaw in the security ofChip-and-PIN, you need to be able to vet quite closely thesupply chain of such terminals. This is something that ismostly beyond the control of customers who need to use theseterminals.To make matters worse for Chip-and-PIN, in around 2009 RossAnderson and his group were able to perform man-in-the-middleattacks against Chip-and-PIN. Essentially they made theterminal think the correct PIN was entered and the card thinkthat a signature was used. This is a kind of \emph{protocolfailure}. After discovery, the flaw was mitigated by requiringthat a link between the card and the bank is established atevery time the card is used. Even later this group foundanother problem with Chip-and-PIN and ATMs which did notgenerate random enough numbers (nonces) on which the securityof the underlying protocols relies. The problem with all this is that the banks who introducedChip-and-PIN managed with the new system to shift theliability for any fraud and the burden of proof onto thecustomer. In the old system, the banks had to prove that thecustomer used the card, which they often did not bother with.In effect, if fraud occurred the customers were either refundedfully or lost only a small amount of money. Thistaking-responsibility-of-potential-fraud was part of the``business plan'' of the banks and did not reduce theirprofits too much. Since banks managed to successfully claim that theirChip-and-PIN system is secure, they were under the new systemable to point the finger at the customer when fraud occurred:customers must have been negligent losing their PIN and theyhad almost no way of defending themselves in such situations.That is why the work of \emph{ethical} hackers like RossAnderson's group was so important, because they and othersestablished that the bank's claim that their system is secureand it must have been the customer's fault, was bogus. In 2009the law changed and the burden of proof went back to thebanks. They need to prove whether it was really the customerwho used a card or not.This is a classic example where a security design principlewas violated: Namely, the one who is in the position toimprove security, also needs to bear the financial losses ifthings go wrong. Otherwise, you end up with an insecuresystem. In case of the Chip-and-PIN system, no good securityengineer would dare to claim that it is secure beyondreproach: the specification of the EMV protocol (underlyingChip-and-PIN) is some 700 pages long, but still leaves outmany things (like how to implement a good random numbergenerator). No human being is able to scrutinise such aspecification and ensure it contains no flaws. Moreover, bankscan add their own sub-protocols to EMV. With all theexperience we already have, it is as clear as day thatcriminals were bound to eventually be able to poke holes intoit and measures need to be taken to address them. However,with how the system was set up, the banks had no realincentive to come up with a system that is really secure.Getting the incentives right in favour of security is often atricky business. From a customer point of view, theChip-and-PIN system was much less secure than the oldsignature-based method. The customer could now losesignificant amounts of money.\subsection*{Of Cookies and Salts}Lets look at another example which will help withunderstanding how passwords should be verified and stored.Imagine you need to develop a web-application that has thefeature of recording how many times a customer visits a page.For example in order to give a discount whenever the customerhas visited a webpage some $x$ number of times (say $x$ equal$5$). There is one more constraint: we want to store theinformation about the number of visits as a cookie on thebrowser. I think, for a number of years the webpage of the NewYork Times operated in this way: it allowed you to read tenarticles per month for free; if you wanted to read more, youhad to pay. My best guess is that it used cookies forrecording how many times their pages was visited, because if Iswitched browsers I could easily circumvent the restrictionabout ten articles.To implement our web-application it is good to look under thehood what happens when a webpage is displayed in a browser. Atypical web-application works as follows: The browser sends aGET request for a particular page to a server. The serveranswers this request with a webpage in HTML (for our purposeswe can ignore the details about HTML). A simple JavaScriptprogram that realises a server answering with a ``helloworld'' webpage is as follows:\begin{center}\lstinputlisting{../progs/ap0.js}\end{center}\noindent The interesting lines are 4 to 7 where the answer tothe GET request is generated\ldots in this case it is just asimple string. This program is run on the server and will beexecuted whenever a browser initiates such a GET request. Youcan run this program on your computer and then direct abrowser to the address \pcode{localhost:8000} in order tosimulate a request over the internet.For our web-application of interest is the feature that theserver when answering the request can store some informationon the client's side. This information is called a\emph{cookie}. The next time the browser makes another GETrequest to the same webpage, this cookie can be read again bythe server. We can use cookies in order to store a counterthat records the number of times our webpage has been visited.This can be realised with the following small program\begin{center}\lstinputlisting{../progs/ap2.js}\end{center}\noindent The overall structure of this program is the same asthe earlier one: Lines 7 to 17 generate the answer to aGET-request. The new part is in Line 8 where we read thecookie called \pcode{counter}. If present, this cookie will besend together with the GET-request from the client. The valueof this counter will come in form of a string, therefore weuse the function \pcode{parseInt} in order to transform itinto an integer. In case the cookie is not present, we defaultthe counter to zero. The odd looking construction \code{...||0} is realising this defaulting in JavaScript. In Line 9 weincrease the counter by one and store it back to the client(under the name \pcode{counter}, since potentially more thanone value could be stored). In Lines 10 to 15 we test whetherthis counter is greater or equal than 5 and send accordingly aspecially grafted message back to the client.Let us step back and analyse this program from a securitypoint of view. We store a counter in plain text on theclient's browser (which is not under our control). Dependingon this value we want to unlock a resource (like a discount)when it reaches a threshold. If the client deletes the cookie,then the counter will just be reset to zero. This does notbother us, because the purported discount will just not begranted. In this way we do not lose any (hypothetical) money.What we need to be concerned about is, however, when a clientartificially increases this counter without having visited ourweb-page. This is actually a trivial task for a knowledgeableperson, since there are convenient tools that allow one to seta cookie to an arbitrary value, for example above ourthreshold for the discount. There seems to be no simple way to prevent this kind oftampering with cookies, because the whole purpose of cookiesis that they are stored on the client's side, which from thethe server's perspective is a potentially hostile environment.What we need to ensure is the integrity of this counter inthis hostile environment. We could think of encrypting thecounter. But this has two drawbacks to do with the key forencryption. If you use a single, global key for all theclients that visit our site, then we risk that our whole``business'' might collapse in the event this key gets knownto the outside world. Then all cookies we might have set inthe past, can now be decrypted and manipulated. If, on theother hand, we use many ``private'' keys for the clients, thenwe have to solve the problem of having to securely store thiskey on our server side (obviously we cannot store the key withthe client because then the client again has all data totamper with the counter; and obviously we also cannot encryptthe key, lest we can solve an impossible chicken-and-eggproblem). So encryption seems to not solve the problem we facewith the integrity of our counter.Fortunately, \emph{hash functions} seem to be more suitablefor our purpose. Like encryption, hash functions scramble datain such a way that it is easy to calculate the output of ahash function from the input. But it is hard (i.e.~practicallyimpossible) to calculate the input from knowing the output.Therefore hash functions are often called \emph{one-wayfunctions}\ldots you cannot go back from the output to theinput (without some tricks, see below). There are several suchhashing function. For example SHA-1 would hash the string\pcode{"hello world"} to produce the hash-value\begin{center}\pcode{2aae6c35c94fcfb415dbe95f408b9ce91ee846ed}\end{center}\noindent Another handy feature of hash functions is that ifthe input changes only a little, the output changesdrastically. For example \pcode{"iello world"} produces underSHA-1 the output\begin{center}\pcode{d2b1402d84e8bcef5ae18f828e43e7065b841ff1}\end{center}\noindent That means it is not predictable what the outputwill be from just looking at input that is ``close by''. We can use hashes in our web-application and store in thecookie the value of the counter in plain text but togetherwith its hash. We need to store both pieces of data in such away that we can extract them again later on. In the code belowI will just separate them using a \pcode{"-"}, for example\begin{center}\pcode{1-356a192b7913b04c54574d18c28d46e6395428ab}\end{center}\noindent for the counter \pcode{1}. If we now read back thecookie when the client visits our webpage, we can extract thecounter, hash it again and compare the result to the storedhash value inside the cookie. If these hashes disagree, thenwe can deduce that the cookie has been tampered with.Unfortunately, if they agree, we can still not be entirelysure that not a clever hacker has tampered with the cookie.The reason is that the hacker can see the clear text part ofthe cookie, say \pcode{3}, and also its hash. It does not takemuch trial and error to find out that we used the SHA-1hashing function and then the hacker can graft a cookieaccordingly. This is eased by the fact that for SHA-1 manystrings and corresponding hash-values are precalculated. Type,for example, into Google the hash value for \pcode{"helloworld"} and you will actually pretty quickly find that it wasgenerated by input string \pcode{"hello world"}. Similarly forthe hash-value for \pcode{1}. This defeats the purpose of ahashing function and thus would not help us with ourweb-applications and later also not with how to storepasswords properly. There is one ingredient missing, which happens to be called\emph{salts}. Salts are random keys, which are added to thecounter before the hash is calculated. In our case we mustkeep the salt secret. As can be see in Figure~\ref{hashsalt},we need to extract from the cookie the counter value and itshash (Lines 19 and 20). But before hashing the counter again(Line 22) we need to add the secret salt. Similarly, when weset the new increased counter, we will need to add the saltbefore hashing (this is done in Line 15). Our web-applicationwill now store cookies like \begin{figure}[p]\lstinputlisting{../progs/App4.js}\caption{\label{hashsalt}}\end{figure}\begin{center}\tt\begin{tabular}{l}1 + salt - 8189effef4d4f7411f4153b13ff72546dd682c69\\2 + salt - 1528375d5ceb7d71597053e6877cc570067a738f\\3 + salt - d646e213d4f87e3971d9dd6d9f435840eb6a1c06\\4 + salt - 5b9e85269e4461de0238a6bf463ed3f25778cbba\\...\\\end{tabular}\end{center}\noindent These hashes allow us to read and set the value ofthe counter, and also give us confidence that the counter hasnot been tampered with. This of course depends on being ableto keep the salt secret. Once the salt is public, we betterignore all cookies and start setting them again with a newsalt.There is an interesting and very subtle point to note withrespect to the New York Times' way of checking the numbervisits. Essentially they have their `resource' unlocked at thebeginning and lock it only when the data in the cookie statesthat the allowed free number of visits are up. As said before,this can be easily circumvented by just deleting the cookie orby switching the browser. This would mean the New York Timeswill lose revenue whenever this kind of tampering occurs. Thequick fix to require that a cookie must always be present doesnot work, because then this newspaper will cut off any newreaders, or anyone who gets a new computer. In contrast, ourweb-application has the resource (discount) locked at thebeginning and only unlocks it if the cookie data says so. Ifthe cookie is deleted, well then the resource just does notget unlocked. No mayor harm will result to us. You can see:the same security mechanism behaves rather differentlydepending on whether the ``resource'' needs to be locked orunlocked. Apart from thinking about the difference verycarefully, I do not know of any good ``theory'' that couldhelp with solving such security intricacies in any other way. \subsection*{How to Store Passwords Properly?}While admittedly quite silly, the simple web-application inthe previous section should help with the more importantquestion of how passwords should be verified and stored. It isunbelievable that nowadays systems still do this withpasswords in plain text. The idea behind such plain-textpasswords is of course that if the user typed in\pcode{foobar} as password, we need to verify whether itmatches with the password that is already stored for this userin the system. Why not doing this with plain-text passwords?But doing this verification in plain text is really a badidea. Unfortunately, evidence suggests it is still awidespread practice. I leave you to think about why verifyingpasswords in plain text is a bad idea.Using hash functions, like in our web-application, we can dobetter. They allow us to not having to store passwords inplain text for verification whether a password matches or not.We can just hash the password and store the hash-value. Andwhenever the user types in a new password, well then we hashit again and check whether the hash-values agree. Just likein the web-application before.Lets analyse what happens when a hacker gets hold of such ahashed password database. That is the scenario we want todefend against.\footnote{If we could assume our servers cannever be broken into, then storing passwords in plain textwould be no problem. The point, however, is that servers arenever absolutely secure.} The hacker has then a list of user names andassociated hash-values, like \begin{center}\pcode{urbanc:2aae6c35c94fcfb415dbe95f408b9ce91ee846ed}\end{center}\noindent For a beginner-level hacker this information is ofno use. It would not work to type in the hash value instead ofthe password, because it will go through the hashing functionagain and then the resulting two hash-values will not match.One attack a hacker can try, however, is called a \emph{bruteforce attack}. Essentially this means trying out exhaustivelyall strings\begin{center}\pcode{a},\pcode{aa},\pcode{...},\pcode{ba},\pcode{...},\pcode{zzz},\pcode{...}\end{center} \noindent and so on, hash them and check whether they matchwith the hash-values in the database. Such brute force attacksare surprisingly effective. With modern technology (usuallyGPU graphic cards), passwords of moderate length only needseconds or hours to be cracked. Well, the only defence we haveagainst such brute force attacks is to make passwords longerand force users to use the whole spectrum of letters and keysfor passwords. The hope is that this makes the search spacetoo big for an effective brute force attack.Unfortunately, clever hackers have another ace up theirsleeves. These are called \emph{dictionary attacks}. The ideabehind dictionary attack is the observation that only fewpeople are competent enough to use sufficiently strongpasswords. Most users (at least too many) use passwords like\begin{center}\pcode{123456},\pcode{password},\pcode{qwerty},\pcode{letmein},\pcode{...}\end{center}\noindent So an attacker just needs to compile a list as largeas possible of such likely candidates of passwords and alsocompute their hash-values. The difference between a bruteforce attack, where maybe $2^{80}$ many strings need to beconsidered, a dictionary attack might get away witch checkingonly 10 Million (remember the language English ``only''contains 600,000 words). This is a drastic simplification forattackers. Now if the attacker knows the hash-value of apassword is\begin{center}\pcode{5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8}\end{center}\noindent then just a lookup in the dictionary will revealthat the plain-text password was \pcode{password}. What isgood about this attack is that the dictionary can beprecompiled in the ``comfort of the hacker's home'' before anactual attack is launched. It just needs sufficient storagespace, which nowadays is pretty cheap. A hacker might in thisway not be able to crack all passwords in our database, buteven being able to crack 50\% can be serious damage for alarge company (because then you have to think about how tomake users to change their old passwords---a major hassle).And hackers are very industrious in compiling thesedictionaries: for example they definitely include variationslike \pcode{passw0rd} and also include rules that cover caseslike \pcode{passwordpassword} or \pcode{drowssap} (passwordreversed). Historically, compiling a list for a dictionaryattack is not as simple as it might seem. At the beginningonly ``real'' dictionaries were available (like the OxfordEnglish Dictionary), but such dictionaries are not``optimised'' for the purpose of passwords. The first realhard data about actually used passwords was obtained when acompany called RockYou ``lost'' 32 Million plain-textpasswords. With this data of real-life passwords, dictionaryattacks took off. Compiling such dictionaries is nowadays veryeasy with the help of off-the-shelf tools.These dictionary attacks can be prevented by using salts.Remember a hacker needs to use the most likely candidates of passwords and calculate their hash-value. If we add beforehashing a password a random salt, like \pcode{mPX2aq},then the string \pcode{passwordmPX2aq} will almost certainly not be in the dictionary. Like in the web-application in theprevious section, a salt does not prevent us from verifying a password. We just need to add the salt whenever the password is typed in again. There is a question whether we should use a single random saltfor every password in our database. A single salt wouldalready make dictionary attacks considerably more difficult.It turns out, however, that in case of password databasesevery password should get their own salt. This salt isgenerated at the time when the password is first set. If you look at a Unix password file you will find entries like\begin{center}\pcode{urbanc:$6$3WWbKfr1$4vblknvGr6FcDeF92R5xFn3mskfdnEn...:...}\end{center}\noindent where the first part is the login-name, followed bya field \pcode{$6$} which specifies which hash-function isused. After that follows the salt \pcode{3WWbKfr1} and afterthat the hash-value that is stored for the password ( whichincludes the salt). I leave it to you to figure out how thepassword verification would need to work based on this data.There is a non-obvious benefit of using a separate salt foreach password. Recall that \pcode{123456} is a popularpassword that is most likely used by several of your users(especially if the database contains millions of entries). Ifwe use no salt or one global salt, all hash-values will be thesame for this password. So if a hacker is in the business ofcracking as much passwords as possible, then it is a good ideato concentrate on those very popular passwords. This is notpossible if each password gets its own salt: since we assumethe salt is generated randomly, each version of \pcode{123456}will be associated with a different hash-value. This willmake the life harder for an attacker.Note another interesting point. The web-application from theprevious section was only secure when the salt was secret. Inthe password case, this is not needed. The salt can be publicas shown above in the Unix password file where is actuallystored as part of the password entry. Knowing the salt does not give the attacker any advantage, but prevents that dictionaries can be precompiled. The moral is that you should never store passwords in plain text. Never ever.