sen-material: comparison handouts/ho01.tex

equal deleted inserted replaced

-:a736a0c324a3
+:681e35f6b0e4
 Lets look at another example which will help with
 understanding how passwords should be verified and stored.
 Imagine you need to develop a web-application that has the
 feature of recording how many times a customer visits a page.
 For example in order to give a discount whenever the customer
-visited a webpage some $x$ number of times (say $x$ equal
+has visited a webpage some $x$ number of times (say $x$ equal
 $5$). There is one more constraint: we want to store the
 information about the number of visits as a cookie on the
 browser. I think, for a number of years the webpage of the New
 York Times operated in this way: it allowed you to read ten
 articles per month for free; if you wanted to read more, you
 To implement our web-application it is good to look under the
 hood what happens when a webpage is displayed in a browser. A
 typical web-application works as follows: The browser sends a
 GET request for a particular page to a server. The server
-answers this request with a webpage in HTML (we can here
+answers this request with a webpage in HTML (for our purposes
-ignore these details). A simple JavaScript program that
+we can ignore the details about HTML). A simple JavaScript
-realises a ``hello world'' webpage is as follows:
+program that realises a server answering with a ``hello
+world'' webpage is as follows:
 \begin{center}
 \lstinputlisting{../progs/ap0.js}
 \end{center}
 web-page. This is actually a trivial task for a knowledgeable
 person, since there are convenient tools that allow one to set
 a cookie to an arbitrary value, for example above our
 threshold for the discount.
-There seems to be no real way to prevent this kind of
+There seems to be no simple way to prevent this kind of
 tampering with cookies, because the whole purpose of cookies
 is that they are stored on the client's side, which from the
 the server's perspective is a potentially hostile environment.
 What we need to ensure is the integrity of this counter in
 this hostile environment. We could think of encrypting the
 other hand, we use many ``private'' keys for the clients, then
 we have to solve the problem of having to securely store this
 key on our server side (obviously we cannot store the key with
 the client because then the client again has all data to
 tamper with the counter; and obviously we also cannot encrypt
-the key, lest we can solve a chicken-and-egg problem). So
+the key, lest we can solve an impossible chicken-and-egg
-encryption seems to not solve the problem we face with the
+problem). So encryption seems to not solve the problem we face
-integrity of our counter.
+with the integrity of our counter.
 Fortunately, \emph{hash functions} seem to be more suitable
 for our purpose. Like encryption, hash functions scramble data
 in such a way that it is easy to calculate the output of a
 hash function from the input. But it is hard (i.e.~practically
 impossible) to calculate the input from knowing the output.
 Therefore hash functions are often called \emph{one-way
-functions}. There are several such hashing function. For
+functions}\ldots you cannot go back from the output to the
-example SHA-1 would hash the string \pcode{"hello world"} to
+input (without some tricks, see below). There are several such
-produce the hash-value
+hashing function. For example SHA-1 would hash the string
+\pcode{"hello world"} to produce the hash-value
 \begin{center}
 \pcode{2aae6c35c94fcfb415dbe95f408b9ce91ee846ed}
 \end{center}
 will be from just looking at input that is ``close by''.
 We can use hashes in our web-application and store in the
 cookie the value of the counter in plain text but together
 with its hash. We need to store both pieces of data in such a
-way that we can extract both components (below I will just
+way that we can extract them again later on (in the code below
-separate them using a \pcode{"-"}). If we now read back the
+I will just separate them using a \pcode{"-"}). If we now read
-cookie when the client visits our webpage, we can extract the
+back the cookie when the client visits our webpage, we can
-counter, hash it again and compare the result to the stored
+extract the counter, hash it again and compare the result to
-hash value inside the cookie. If these hashes disagree, then
+the stored hash value inside the cookie. If these hashes
-we can deduce that the cookie has been tampered with.
+disagree, then we can deduce that the cookie has been tampered
-Unfortunately, if they agree, we can still not be entirely
+with. Unfortunately, if they agree, we can still not be
-sure that not a clever hacker has tampered with the cookie.
+entirely sure that not a clever hacker has tampered with the
-The reason is that the hacker can see the clear text part of
+cookie. The reason is that the hacker can see the clear text
-the cookie, say \pcode{3}, and also its hash. It does not take
+part of the cookie, say \pcode{3}, and also its hash. It does
-much trial and error to find out that we used the SHA-1
+not take much trial and error to find out that we used the
-hashing functions and then the hacker can graft a cookie
+SHA-1 hashing function and then the hacker can graft a cookie
 accordingly. This is eased by the fact that for SHA-1 many
 strings and corresponding hash-values are precalculated. Type,
 for example, into Google the hash value for \pcode{"hello
 world"} and you will actually pretty quickly find that it was
-generated by input string \pcode{"hello wolrd"}. This defeats
+generated by input string \pcode{"hello world"}. This defeats
-the purpose of a hashing functions and thus would not help us
+the purpose of a hashing function and thus would not help us
-with our web-applications and later with how to store
+with our web-applications and later also not with how to store
 passwords properly.
 There is one ingredient missing, which happens to be called
 \emph{salts}. Salts are random keys, which are added to the
 \end{center}
 \noindent These hashes allow us to read and set the value of
 the counter, and also give us confidence that the counter has
 not been tampered with. This of course depends on being able
-to keep the salt secret. Once this is out, we better ignore
+to keep the salt secret. Once the salt is public, we better
-all cookies and start setting them again with a new salt.
+ignore all cookies and start setting them again with a new
+salt.
 There is an interesting and very subtle point to note with
 respect to the New York Times' way of checking the number
 visits. Essentially they have their `resource' unlocked at the
 beginning and lock it only when the data in the cookie states
 that the allowed free number of visits are up. As said before,
 this can be easily circumvented by just deleting the cookie or
 by switching the browser. This would mean the New York Times
-will lose revenue whenever this kind of tampering occurs. In
+will lose revenue whenever this kind of tampering occurs. The
-contrast, our web-application has the resource (discount)
+quick fix to require that a cookie must always be present does
-locked at the beginning and only unlocks it if the cookie data
+not work, because then this newspaper will cut off any new
-says so. If the cookie is deleted, well then the resource just
+readers, or anyone who gets a new computer. In contrast, our
-does not get unlocked. No mayor harm will result to us. You
+web-application has the resource (discount) locked at the
-can see: the same security mechanism behaves rather
+beginning and only unlocks it if the cookie data says so. If
-differently depending on whether the ``resource'' needs to be
+the cookie is deleted, well then the resource just does not
-locked or unlocked. Apart from think about the difference very
+get unlocked. No mayor harm will result to us. You can see:
-carefully, I do not know of any ``theory'' that could help
+the same security mechanism behaves rather differently
-with solving such security intricacies in any automatic way.
+depending on whether the ``resource'' needs to be locked or
+unlocked. Apart from thinking about the difference very
-\subsection*{How to Store Passwords}
+carefully, I do not know of any good ``theory'' that could
+help with solving such security intricacies in any other way.
+\subsection*{How to Store Passwords Properly?}
 While admittedly quite silly, the simple web-application in
 the previous section should help with the more important
 question of how passwords should be verified and stored. It is
 unbelievable that nowadays systems still do this with
 passwords in plain text. The idea behind such plain-text
-passwords is of course that if the user typed in \emph{foobar}
+passwords is of course that if the user typed in
-as password, we need to verify whether it matches with the
+\pcode{foobar} as password, we need to verify whether it
-password that is already stored for this user in the system.
+matches with the password that is already stored for this user
-But doing this verification in plain text is really a bad
+in the system. But doing this verification in plain text is
-idea. Unfortunately, evidence suggests, however, it is still a
+really a bad idea. Unfortunately, evidence suggests, however,
-widespread practice. I leave you to think about why verifying
+it is still a widespread practice. I leave you to think about
-passwords in plain text is a bad idea.
+why verifying passwords in plain text is a bad idea.
-Using hash functions we can do better. It is not really
+Using hash functions, like in our web-application, we can do
-necessary to store passwords in plain text in order to verify
+better. They allow us to not having to store passwords in
-whether a password matches. We can just hash the password in
+plain text for verification whether a password matches or not.
-order to be stored. And whenever the user types in a new
+We can just hash the password and store the hash-value. And
-password, well then we hash it again and check whether the
+whenever the user types in a new password, well then we hash
-hash-values agree. Lets analyse what happens when a hacker
+it again and check whether the hash-values agree. Just like
-gets hold of our password database. In case the passwords are
+in the web-application before.
-hashed, the hacker has a list of user names and associated
-hash-values, like
+Lets analyse what happens when a hacker gets hold of such a
+hashed password database. The hacker has then a list of user
+names and associated hash-values, like
 \begin{center}
 \pcode{urbanc:2aae6c35c94fcfb415dbe95f408b9ce91ee846ed}
 \end{center}
-\noindent For a beginner-level hacker this information
+\noindent For a beginner-level hacker this information is of
-is of no use. It would not work to type in the hash value
+no use. It would not work to type in the hash value instead of
-instead of the password, because it will go through the
+the password, because it will go through the hashing function
-hashing function again and then the resulting two
+again and then the resulting two hash-values will not match.
-hash-values will not match. One attack a hacker can try is
+One attack a hacker can try, however, is called a \emph{brute
-called a \emph{brute force attack}. Essentially this means
+force attack}. Essentially this means trying out exhaustively
-trying out exhaustively all strings
+all strings
 \begin{center}
 \pcode{a},
 \pcode{aa},
 \pcode{...},
 \pcode{...},
 \pcode{zzz},
 \pcode{...}
 \end{center}
-\noindent hash them and check whether they match with the
+\noindent and so on, hash them and check whether they match
-hash-values in the database. Such brute force attacks are
+with the hash-values in the database. Such brute force attacks
-surprisingly effective. With modern technology (usually GPU
+are surprisingly effective. With modern technology (usually
-graphic cards), passwords of moderate length
+GPU graphic cards), passwords of moderate length only needs
+seconds or hours to be cracked. Well the only defence we have
-%The corresponding attack is called \emph{dictionary
+is to make passwords longer and force users to use the whole
-%attack}\ldots hashes are not reversed by brute force
+spectrum of letters and keys for passwords in order to make
-%calculations, that is trying out all possible combinations.
+the search space to big for an effective brute force attack.
+Unfortunately, clever hackers have another ace up their
-%We have to make sure the salt does not get known.
+sleeves. These are called \emph{dictionary attacks}. The idea
+behind dictionary attack is the observation that only few
+people are competent enough to use sufficiently strong
-%Note ....NYT
+passwords. Most users (at least too many) use passwords like
+\begin{center}
+\pcode{123456},
+\pcode{password},
+\pcode{qwerty},
+\pcode{letmein},
+\pcode{...}
+\end{center}
+\noindent So an attacker just needs to compile a list
+as large as possible of such likely candidates of passwords
+and also compute their hash-values. Now if the attacker
+knows the hash-value of a password is
+\begin{center}
+\pcode{5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8}
+\end{center}
+\noindent then just a lookup in the dictionary will reveal
+that the plain-text password was \pcode{password}. What is
+good about this attack is that the dictionary can be
+precompiled in the ``comfort of the hacker's home'' before an
+actual attack is launched. It just needs sufficient storage
+space, which nowadays is pretty cheap. A hacker might in this
+way not be able to crack all passwords in our database, but
+even being able to crack 50\% can be serious damage for a
+large company (because then you have to think how to make
+users to change their old passwords). And hackers are very
+industrious in compiling these dictionaries: for example they
+definitely include variations like \pcode{passw0rd} and also
+includes rules that cover cases like \pcode{passwordpassword}
+or \pcode{drowssap} (password reversed). Historically,
+compiling a list for a dictionary attack is not as simple as
+it might seem. At the beginning only ``real'' dictionaries
+were available (like the Oxford English Dictionary), but such
+dictionary are not ``optimised'' for the purpose of passwords.
+The first real hard date was obtained when a company called
+RockYou ``lost'' 32 Million plain-text password. With this
+data of real-life passwords, dictionary attacks took off.
+These dictionary attacks can be prevented by using salts.
+Remember a hacker needs to use the most likely candidates
+of passwords and calculate their has-value. If we add before
+hashing a password with a random salt, like \pcode{mPX2aq},
+then the string \pcode{passwordmPX2aq} will almost certainly
+not be in the dictionary. Like in the web-application in the
+previous section a salt does not prevent us from verifying a
+password. We just need to add the salt whenever the password
+is typed in again.
+There is a question whether we should us a single random salt
+for every password in our database. A single salt would
+already make dictionary attacks considerably more difficult.
+It turns out, however, that in case of password databases
+every password should get their own salt. This salt is
+generated at the time when the password is first set.
+If you look at a Unix password file you will find entries like
+\begin{center}
+\pcode{urbanc:$6$3WWbKfr1$4vblknvGr6FcDeF92R5xFn3mskfdnEn...:...}
+\end{center}
+\noindent where the first part is the login-name, followed
+by a field \pcode{$6$} which specifies which hash-function
+is used. After that follows the salt \pcode{3WWbKfr1} and
+after that the hash-value that is stored for the password plus
+salt. I leave it to you to figure out how the password
+verification would need to work based on this data.
+There is a non-obvious benefit of using a separate salt for
+each password. Recall that \pcode{123456} is a popular
+password that is most likely used by several of your users
+(especially if the database contains millions of entries). If
+we use no salt or one global salt, all hash-values will be the
+same for this password. So if a hacker is in the business of
+cracking as much passwords as possible, then it is a good idea
+to concentrate on those very popular passwords. This is not
+possible if each password gets its own salt: since we assume
+the salt is generated randomly, each version of \pcode{123456}
+will be associated with a different hash-value.
+Note another interesting point. The web-application from the
+previous section was only secure when the salt was secret. In
+the password case, this is not needed. The salt can be public
+as shown above and is actually stored as part of the password
+entry. Knowing the salt does not give the attacker any
+advantage, but prevents that dictionaries can be precompiled.
 \end{document}
 %%% Local Variables:
 %%% mode: latex
 %%% TeX-master: t

changeset 182	681e35f6b0e4
parent 181	a736a0c324a3
child 183	6ed7c9b8b291