--- a/handouts/ho01.tex Thu Sep 25 14:39:56 2014 +0100
+++ b/handouts/ho01.tex Fri Sep 26 02:42:00 2014 +0100
@@ -182,7 +182,7 @@
Imagine you need to develop a web-application that has the
feature of recording how many times a customer visits a page.
For example in order to give a discount whenever the customer
-visited a webpage some $x$ number of times (say $x$ equal
+has visited a webpage some $x$ number of times (say $x$ equal
$5$). There is one more constraint: we want to store the
information about the number of visits as a cookie on the
browser. I think, for a number of years the webpage of the New
@@ -197,9 +197,10 @@
hood what happens when a webpage is displayed in a browser. A
typical web-application works as follows: The browser sends a
GET request for a particular page to a server. The server
-answers this request with a webpage in HTML (we can here
-ignore these details). A simple JavaScript program that
-realises a ``hello world'' webpage is as follows:
+answers this request with a webpage in HTML (for our purposes
+we can ignore the details about HTML). A simple JavaScript
+program that realises a server answering with a ``hello
+world'' webpage is as follows:
\begin{center}
\lstinputlisting{../progs/ap0.js}
@@ -258,7 +259,7 @@
a cookie to an arbitrary value, for example above our
threshold for the discount.
-There seems to be no real way to prevent this kind of
+There seems to be no simple way to prevent this kind of
tampering with cookies, because the whole purpose of cookies
is that they are stored on the client's side, which from the
the server's perspective is a potentially hostile environment.
@@ -275,9 +276,9 @@
key on our server side (obviously we cannot store the key with
the client because then the client again has all data to
tamper with the counter; and obviously we also cannot encrypt
-the key, lest we can solve a chicken-and-egg problem). So
-encryption seems to not solve the problem we face with the
-integrity of our counter.
+the key, lest we can solve an impossible chicken-and-egg
+problem). So encryption seems to not solve the problem we face
+with the integrity of our counter.
Fortunately, \emph{hash functions} seem to be more suitable
for our purpose. Like encryption, hash functions scramble data
@@ -285,9 +286,10 @@
hash function from the input. But it is hard (i.e.~practically
impossible) to calculate the input from knowing the output.
Therefore hash functions are often called \emph{one-way
-functions}. There are several such hashing function. For
-example SHA-1 would hash the string \pcode{"hello world"} to
-produce the hash-value
+functions}\ldots you cannot go back from the output to the
+input (without some tricks, see below). There are several such
+hashing function. For example SHA-1 would hash the string
+\pcode{"hello world"} to produce the hash-value
\begin{center}
\pcode{2aae6c35c94fcfb415dbe95f408b9ce91ee846ed}
@@ -308,25 +310,25 @@
We can use hashes in our web-application and store in the
cookie the value of the counter in plain text but together
with its hash. We need to store both pieces of data in such a
-way that we can extract both components (below I will just
-separate them using a \pcode{"-"}). If we now read back the
-cookie when the client visits our webpage, we can extract the
-counter, hash it again and compare the result to the stored
-hash value inside the cookie. If these hashes disagree, then
-we can deduce that the cookie has been tampered with.
-Unfortunately, if they agree, we can still not be entirely
-sure that not a clever hacker has tampered with the cookie.
-The reason is that the hacker can see the clear text part of
-the cookie, say \pcode{3}, and also its hash. It does not take
-much trial and error to find out that we used the SHA-1
-hashing functions and then the hacker can graft a cookie
+way that we can extract them again later on (in the code below
+I will just separate them using a \pcode{"-"}). If we now read
+back the cookie when the client visits our webpage, we can
+extract the counter, hash it again and compare the result to
+the stored hash value inside the cookie. If these hashes
+disagree, then we can deduce that the cookie has been tampered
+with. Unfortunately, if they agree, we can still not be
+entirely sure that not a clever hacker has tampered with the
+cookie. The reason is that the hacker can see the clear text
+part of the cookie, say \pcode{3}, and also its hash. It does
+not take much trial and error to find out that we used the
+SHA-1 hashing function and then the hacker can graft a cookie
accordingly. This is eased by the fact that for SHA-1 many
strings and corresponding hash-values are precalculated. Type,
for example, into Google the hash value for \pcode{"hello
world"} and you will actually pretty quickly find that it was
-generated by input string \pcode{"hello wolrd"}. This defeats
-the purpose of a hashing functions and thus would not help us
-with our web-applications and later with how to store
+generated by input string \pcode{"hello world"}. This defeats
+the purpose of a hashing function and thus would not help us
+with our web-applications and later also not with how to store
passwords properly.
@@ -359,8 +361,9 @@
\noindent These hashes allow us to read and set the value of
the counter, and also give us confidence that the counter has
not been tampered with. This of course depends on being able
-to keep the salt secret. Once this is out, we better ignore
-all cookies and start setting them again with a new salt.
+to keep the salt secret. Once the salt is public, we better
+ignore all cookies and start setting them again with a new
+salt.
There is an interesting and very subtle point to note with
respect to the New York Times' way of checking the number
@@ -369,53 +372,58 @@
that the allowed free number of visits are up. As said before,
this can be easily circumvented by just deleting the cookie or
by switching the browser. This would mean the New York Times
-will lose revenue whenever this kind of tampering occurs. In
-contrast, our web-application has the resource (discount)
-locked at the beginning and only unlocks it if the cookie data
-says so. If the cookie is deleted, well then the resource just
-does not get unlocked. No mayor harm will result to us. You
-can see: the same security mechanism behaves rather
-differently depending on whether the ``resource'' needs to be
-locked or unlocked. Apart from think about the difference very
-carefully, I do not know of any ``theory'' that could help
-with solving such security intricacies in any automatic way.
+will lose revenue whenever this kind of tampering occurs. The
+quick fix to require that a cookie must always be present does
+not work, because then this newspaper will cut off any new
+readers, or anyone who gets a new computer. In contrast, our
+web-application has the resource (discount) locked at the
+beginning and only unlocks it if the cookie data says so. If
+the cookie is deleted, well then the resource just does not
+get unlocked. No mayor harm will result to us. You can see:
+the same security mechanism behaves rather differently
+depending on whether the ``resource'' needs to be locked or
+unlocked. Apart from thinking about the difference very
+carefully, I do not know of any good ``theory'' that could
+help with solving such security intricacies in any other way.
-\subsection*{How to Store Passwords}
+\subsection*{How to Store Passwords Properly?}
While admittedly quite silly, the simple web-application in
the previous section should help with the more important
question of how passwords should be verified and stored. It is
unbelievable that nowadays systems still do this with
passwords in plain text. The idea behind such plain-text
-passwords is of course that if the user typed in \emph{foobar}
-as password, we need to verify whether it matches with the
-password that is already stored for this user in the system.
-But doing this verification in plain text is really a bad
-idea. Unfortunately, evidence suggests, however, it is still a
-widespread practice. I leave you to think about why verifying
-passwords in plain text is a bad idea.
+passwords is of course that if the user typed in
+\pcode{foobar} as password, we need to verify whether it
+matches with the password that is already stored for this user
+in the system. But doing this verification in plain text is
+really a bad idea. Unfortunately, evidence suggests, however,
+it is still a widespread practice. I leave you to think about
+why verifying passwords in plain text is a bad idea.
-Using hash functions we can do better. It is not really
-necessary to store passwords in plain text in order to verify
-whether a password matches. We can just hash the password in
-order to be stored. And whenever the user types in a new
-password, well then we hash it again and check whether the
-hash-values agree. Lets analyse what happens when a hacker
-gets hold of our password database. In case the passwords are
-hashed, the hacker has a list of user names and associated
-hash-values, like
+Using hash functions, like in our web-application, we can do
+better. They allow us to not having to store passwords in
+plain text for verification whether a password matches or not.
+We can just hash the password and store the hash-value. And
+whenever the user types in a new password, well then we hash
+it again and check whether the hash-values agree. Just like
+in the web-application before.
+
+Lets analyse what happens when a hacker gets hold of such a
+hashed password database. The hacker has then a list of user
+names and associated hash-values, like
\begin{center}
\pcode{urbanc:2aae6c35c94fcfb415dbe95f408b9ce91ee846ed}
\end{center}
-\noindent For a beginner-level hacker this information
-is of no use. It would not work to type in the hash value
-instead of the password, because it will go through the
-hashing function again and then the resulting two
-hash-values will not match. One attack a hacker can try is
-called a \emph{brute force attack}. Essentially this means
-trying out exhaustively all strings
+\noindent For a beginner-level hacker this information is of
+no use. It would not work to type in the hash value instead of
+the password, because it will go through the hashing function
+again and then the resulting two hash-values will not match.
+One attack a hacker can try, however, is called a \emph{brute
+force attack}. Essentially this means trying out exhaustively
+all strings
\begin{center}
\pcode{a},
@@ -427,20 +435,109 @@
\pcode{...}
\end{center}
-\noindent hash them and check whether they match with the
-hash-values in the database. Such brute force attacks are
-surprisingly effective. With modern technology (usually GPU
-graphic cards), passwords of moderate length
+\noindent and so on, hash them and check whether they match
+with the hash-values in the database. Such brute force attacks
+are surprisingly effective. With modern technology (usually
+GPU graphic cards), passwords of moderate length only needs
+seconds or hours to be cracked. Well the only defence we have
+is to make passwords longer and force users to use the whole
+spectrum of letters and keys for passwords in order to make
+the search space to big for an effective brute force attack.
+
+Unfortunately, clever hackers have another ace up their
+sleeves. These are called \emph{dictionary attacks}. The idea
+behind dictionary attack is the observation that only few
+people are competent enough to use sufficiently strong
+passwords. Most users (at least too many) use passwords like
+
+\begin{center}
+\pcode{123456},
+\pcode{password},
+\pcode{qwerty},
+\pcode{letmein},
+\pcode{...}
+\end{center}
+
+\noindent So an attacker just needs to compile a list
+as large as possible of such likely candidates of passwords
+and also compute their hash-values. Now if the attacker
+knows the hash-value of a password is
+
+\begin{center}
+\pcode{5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8}
+\end{center}
-%The corresponding attack is called \emph{dictionary
-%attack}\ldots hashes are not reversed by brute force
-%calculations, that is trying out all possible combinations.
+\noindent then just a lookup in the dictionary will reveal
+that the plain-text password was \pcode{password}. What is
+good about this attack is that the dictionary can be
+precompiled in the ``comfort of the hacker's home'' before an
+actual attack is launched. It just needs sufficient storage
+space, which nowadays is pretty cheap. A hacker might in this
+way not be able to crack all passwords in our database, but
+even being able to crack 50\% can be serious damage for a
+large company (because then you have to think how to make
+users to change their old passwords). And hackers are very
+industrious in compiling these dictionaries: for example they
+definitely include variations like \pcode{passw0rd} and also
+includes rules that cover cases like \pcode{passwordpassword}
+or \pcode{drowssap} (password reversed). Historically,
+compiling a list for a dictionary attack is not as simple as
+it might seem. At the beginning only ``real'' dictionaries
+were available (like the Oxford English Dictionary), but such
+dictionary are not ``optimised'' for the purpose of passwords.
+The first real hard date was obtained when a company called
+RockYou ``lost'' 32 Million plain-text password. With this
+data of real-life passwords, dictionary attacks took off.
+
+These dictionary attacks can be prevented by using salts.
+Remember a hacker needs to use the most likely candidates
+of passwords and calculate their has-value. If we add before
+hashing a password with a random salt, like \pcode{mPX2aq},
+then the string \pcode{passwordmPX2aq} will almost certainly
+not be in the dictionary. Like in the web-application in the
+previous section a salt does not prevent us from verifying a
+password. We just need to add the salt whenever the password
+is typed in again.
+
+There is a question whether we should us a single random salt
+for every password in our database. A single salt would
+already make dictionary attacks considerably more difficult.
+It turns out, however, that in case of password databases
+every password should get their own salt. This salt is
+generated at the time when the password is first set.
+If you look at a Unix password file you will find entries like
+
+\begin{center}
+\pcode{urbanc:$6$3WWbKfr1$4vblknvGr6FcDeF92R5xFn3mskfdnEn...:...}
+\end{center}
+
+\noindent where the first part is the login-name, followed
+by a field \pcode{$6$} which specifies which hash-function
+is used. After that follows the salt \pcode{3WWbKfr1} and
+after that the hash-value that is stored for the password plus
+salt. I leave it to you to figure out how the password
+verification would need to work based on this data.
+
+There is a non-obvious benefit of using a separate salt for
+each password. Recall that \pcode{123456} is a popular
+password that is most likely used by several of your users
+(especially if the database contains millions of entries). If
+we use no salt or one global salt, all hash-values will be the
+same for this password. So if a hacker is in the business of
+cracking as much passwords as possible, then it is a good idea
+to concentrate on those very popular passwords. This is not
+possible if each password gets its own salt: since we assume
+the salt is generated randomly, each version of \pcode{123456}
+will be associated with a different hash-value.
+
+Note another interesting point. The web-application from the
+previous section was only secure when the salt was secret. In
+the password case, this is not needed. The salt can be public
+as shown above and is actually stored as part of the password
+entry. Knowing the salt does not give the attacker any
+advantage, but prevents that dictionaries can be precompiled.
-%We have to make sure the salt does not get known.
-
-
-%Note ....NYT
\end{document}
%%% Local Variables: