Binary file handouts/ho01.pdf has changed
Binary file handouts/ho02.pdf has changed
Binary file handouts/ho03.pdf has changed
Binary file handouts/ho04.pdf has changed
Binary file handouts/ho05.pdf has changed
Binary file handouts/ho06.pdf has changed
--- a/handouts/ho06.tex Sat Sep 23 19:39:53 2017 +0100
+++ b/handouts/ho06.tex Sat Sep 23 19:52:27 2017 +0100
@@ -1,620 +1,559 @@
\documentclass{article}
\usepackage{../style}
+\usepackage{../graphics}
\begin{document}
\fnote{\copyright{} Christian Urban, King's College London, 2014, 2015}
-\section*{Handout 6 (Zero-Knowledge Proofs)}
+%https://www.theguardian.com/technology/2016/oct/04/yahoo-secret-email-program-nsa-fbi
+%https://nakedsecurity.sophos.com/2015/11/12/california-collects-owns-and-sells-infants-dna-samples/
+%http://randomwalker.info/teaching/fall-2012-privacy-technologies/?
+%https://josephhall.org/papers/NYU-MCC-1303-S2012_privacy_syllabus.pdf
+%http://www.jetlaw.org/wp-content/uploads/2014/06/Bambauer_Final.pdf
+%http://www.cs.cmu.edu/~yuxiangw/docs/Differential%20Privacy.pdf
+%https://www.youtube.com/watch?v=Gx13lgEudtU
+%https://fpf.org/wp-content/uploads/Differential-Privacy-as-a-Response-to-the-Reidentification-Threat-Klinefelter-and-Chin.pdf
+%http://research.neustar.biz/2014/09/08/differential-privacy-the-basics/
+
+%=====
+%Tim Greene, Network World, 17 Dec 2015 (via ACM TechNews, 18 Dec 2015)
+%
+%Massachusetts Institute of Technology (MIT) researchers' experimental
+%Vuvuzela messaging system offers more privacy than The Onion Router (Tor) by
+%rendering text messages sent through it untraceable. MIT Ph.D. student
+%David Lazar says Vuvuzela resists traffic analysis attacks, while Tor
+%cannot. The researchers say the system functions no matter how many parties
+%are using it to communicate, and it employs encryption and a set of servers
+%to conceal whether or not parties are participating in text-based dialogues.
+%"Vuvuzela prevents an adversary from learning which pairs of users are
+%communicating, as long as just one out of [the] servers is not compromised,
+%even for users who continue to use Vuvuzela for years," they note. Vuvuzela
+%can support millions of users hosted on commodity servers deployed by a
+%single group of users. Instead of anonymizing users, Vuvuzela prevents
+%outside observers from differentiating between people sending messages,
+%receiving messages, or neither, according to Lazar. The system imposes
+%noise on the client-server traffic which cannot be distinguished from actual
+%messages, and all communications are triple-wrapped in encryption by three
+%servers. "Vuvuzela guarantees privacy as long as one of the servers is
+%uncompromised, so using more servers increases security at the cost of
+%increased message latency," Lazar notes.
+%http://orange.hosting.lsoft.com/trk/click?ref=znwrbbrs9_5-e70bx2d991x066779&
-Zero-knowledge proofs (short ZKP) solve a paradoxical puzzle:
-How to convince somebody else that one knows a secret, without
-revealing what the secret actually is? This sounds like a
-problem the Mad Hatter from Alice in Wonderland would occupy
-himself with, but actually there some serious and not so
-serious applications of it. For example, if you solve
-crosswords with your friend, say Bob, you might want to
-convince him that you found a solution for one question, but
-of course you do not want to reveal the solution, as this
-might give Bob an advantage somewhere else in the crossword.
+%%%%
+%% canvas tracking
+%%https://freedom-to-tinker.com/blog/englehardt/the-princeton-web-census-a-1-million-site-measurement-and-analysis-of-web-privacy/
+
+%%%
+%% cupit re-identification attack
+%% https://nakedsecurity.sophos.com/2016/05/20/published-personal-data-on-70000-okcupid-users-taken-down-after-dmca-order/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+nakedsecurity+%28Naked+Security+-+Sophos%29
+
+%Differential privacy
+%=====================
+%https://www.wired.com/2016/06/apples-differential-privacy-collecting-data/
+
+%Differential privacy, translated from Apple-speak, is the
+%statistical science of trying to learn as much as possible
+%about a group while learning as little as possible about any
+%individual in it.
-So how to convince Bob that you know the answer (or a secret)?
-One way would be to come up with the following protocol:
-Suppose the answer is \textit{folio}. Then look up the
-definition of \textit{folio} in a dictionary. Say you find:
+%As Roth notes when he refers to a “mathematical proof,”
+%differential privacy doesn’t merely try to obfuscate or
+%“anonymize” users’ data. That anonymization approach, he
+%argues, tends to fail. In 2007, for instance, Netflix released
+%a large collection of its viewers’ film ratings as part of a
+%competition to optimize its recommendations, removing people’s
+%names and other identifying details and publishing only their
+%Netflix ratings. But researchers soon cross-referenced the
+%Netflix data with public review data on IMDB to match up
+%similar patterns of recommendations between the sites and add
+%names back into Netflix’s supposedly anonymous database.
+
+%As an example of that last method, Microsoft’s Dwork points to
+%the technique in which a survey asks if the respondent has
+%ever, say, broken a law. But first, the survey asks them to
+%flip a coin. If the result is tails, they should answer
+%honestly. If the result is heads, they’re instructed to flip
+%the coin again and then answer “yes” for heads or “no” for
+%tails. The resulting random noise can be subtracted from the
+%results with a bit of algebra, and every respondent is
+%protected from punishment if they admitted to lawbreaking.
-\begin{quote}
-``an \textit{individual} leaf of paper or parchment, either
-loose as one of a series or forming part of a bound volume,
-which is numbered on the recto or front side only.''
-\end{quote}
+%https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf
+
+% Windows 10 data send back to Microsoft (Cortana)
+%Here’s a non-exhaustive list of data sent back: location data, text
+%input, voice input, touch input, webpages you visit, and telemetry
+%data regarding your general usage of your computer, including which
+%programs you run and for how long.
-\noindent
-Take the first non-small word\footnote{Let's say the, a, an,
-or, and \ldots fall into the category of small words.} in this definition,
-in this case \textit{individual}, and look up the definition
-of this word, say
+% Businesses are already using customised pricing online based on
+% information they can glean about you. It is hard to know how
+% widespread the practice is; companies keep their pricing strategies
+% closely guarded and are wary of the bad PR price discrimination
+% could pose. However, it is clear that a number of large retailers
+% are experimenting with it. Staples, for example, has offered
+% discounted prices based on whether rival stores are within 20 miles
+% of its customers’ location. Office Depot has admitted to using its
+% customers’ browsing history and location to vary its range of offers
+% and products. A 2014 study from Northeastern University found
+% evidence of “steering” or differential pricing at four out of 10
+% general merchandise websites and five out of five travel
+% websites. (Steering is when a company doesn’t give you a customised
+% price, but points you towards more expensive options if it thinks
+% you will pay more.) The online travel company Orbitz raised
+% headlines in 2012 when it emerged that the firm was pointing Mac
+% users towards higher-priced hotel rooms than PC users.
+
+
+%%% government will overwrite your wishes if it is annoymous
+%% https://www.lightbluetouchpaper.org/2016/12/05/government-u-turn-on-health-privacy/
+
+%% corporate surveilance / privacy - report and CC3C talk
+%% http://crackedlabs.org/en/networksofcontrol
+%% https://media.ccc.de/v/33c3-8414-corporate_surveillance_digital_tracking_big_data_privacy#video&t=2933
+
+\section*{Handout 6 (Privacy)}
-\begin{quote}
-``a single \textit{human} being as distinct from a group''
-\end{quote}
+The first motor car was invented around 1886. For ten years,
+until 1896, the law in the UK (and elsewhere) required a
+person to walk in front of any moving car waving a red flag.
+Cars were such a novelty that most people did not know what to
+make of them. The person with the red flag was intended to
+warn the public, for example horse owners, about the impending
+novelty---a car. In my humble opinion, we are at the same
+stage of development with privacy. Nobody really knows what it
+is about or what it is good for. All seems very hazy. There
+are a few laws (e.g.~cookie law, right-to-be-forgotten law)
+which address problems with privacy, but even if they are well
+intentioned, they either back-fire or are already obsolete
+because of newer technologies. The result is that the world of
+``privacy'' looks a little bit like the old Wild
+West---lawless and mythical.
-\noindent
-In this definition take the second non-small word, that
-is \textit{human}, and again look up the definition of this
-word. This will yield
+We would have hoped that after Snowden, Western governments
+would be a bit more sensitive and enlightned about the topic
+of privacy, but this is far from the truth. Ross Anderson
+wrote the following in his blog\footnote{\url{https://www.lightbluetouchpaper.org/2016/02/11/report-on-the-ip-bill/}} about the approach taken in
+the US to lessons learned from the Snowden leaks and contrasts
+this with the new snooping bill that is considered in the UK
+parliament:
-\begin{quote}
-``relating to or \textit{characteristic} of humankind''
+\begin{quote}\it
+``The comparison with the USA is stark. There, all three
+branches of government realised they'd gone too far after
+Snowden. President Obama set up the NSA review group, and
+implemented most of its recommendations by executive order;
+the judiciary made changes to the procedures of the FISA
+Court; and Congress failed to renew the data retention
+provisions in the Patriot Act (aided by the judiciary). Yet
+here in Britain the response is just to take Henry VIII powers
+to legalise all the illegal things that GCHQ had been up to,
+and hope that the European courts won't strike the law down
+yet again.''
\end{quote}
-\noindent
-You could go on looking up the definition of the third
-non-small word in this definition and so on. But let us assume
-you agreed with Bob to stop after three iterations with the
-third non-article word in the last definition, that is
-\textit{or}. Now, instead of sending to Bob the solution
-\textit{folio}, you send to him \textit{characteristic}.
+\noindent Unfortunately, also big organisations besides
+governments seem to take an unenlightened approach to privacy.
+For example, UCAS, a charity set up to help students with
+applying to universities in the UK, has a commercial unit that
+happily sells your email addresses to anybody who forks out
+enough money for bombarding you with spam. Yes, you can opt
+out very often from such ``schemes'', but in case of UCAS any
+opt-out will limit also legit emails you might actually be
+interested in.\footnote{The main objectionable point, in my
+opinion, is that the \emph{charity} everybody has to use for
+HE applications has actually very honourable goals
+(e.g.~assist applicants in gaining access to universities),
+but the small print (or better the link ``About us'') reveals
+they set up their organisation so that they can also
+shamelessly sell the email addresses they ``harvest''.
+Everything is of course very legal\ldots{}ethical?\ldots{}well
+that is in the eye of the beholder. See:
-How can Bob verify that you know the solution? Well, once he
-solved it himself, he can use the dictionary and follow the
-same ``trail'' as you did. If the final word agrees with what
-you had sent him, he must infer you knew the solution earlier
-than him. This protocol works like a one-way hash function
-because \textit{characteristic} does not give any hint as to
-what was the first word was. I leave you to think why this
-protocol avoids small words?
+\url{http://www.ucas.com/about-us/inside-ucas/advertising-opportunities}
+or
+\url{http://www.theguardian.com/uk-news/2014/mar/12/ucas-sells-marketing-access-student-data-advertisers}}
-After Bob found his solution and verified that according to
-the protocol it ``maps'' also to \textit{characteristic}, can
-he be entirely sure no cheating is going on? Not with 100\%
-certainty. It could have been possible that he was given
-\textit{characteristic} as the word, but it derived from a
-different word. This might seem very unlikely, but at least
-theoretical it is a possibility. Protocols based on
-zero-knowledge proofs will produce a similar result---they
-give an answer that might be erroneous in a very small number
-of cases. The point is to iterate them long enough so that the
-theoretical possibility of cheating is negligibly small.
+Another example: Verizon, an ISP who is supposed to provide
+you just with connectivity, has found a ``nice'' side-business
+too: When you have enabled all privacy guards in your browser
+(the few you have at your disposal), Verizon happily adds a
+kind of cookie to your
+HTTP-requests.\footnote{\url{http://webpolicy.org/2014/10/24/how-verizons-advertising-header-works/}}
+As shown in the picture below, this cookie will be sent to
+every web-site you visit. The web-sites then can forward the
+cookie to advertisers who in turn pay Verizon to tell them
+everything they want to know about the person who just made
+this request, that is you.
+
+\begin{center}
+\includegraphics[scale=0.16]{../pics/verizon.png}
+\end{center}
-By the way, the authors of the paper ``Dismantling Megamos
-Crypto: Wirelessly Lockpicking a Vehicle Immobilizer'' who
-were barred from publishing their results used also a hash to
-prove they did the work and (presumably) managed to get into
-cars without a key; see Figure~\ref{paper}. This is very
-similar to the method above about crosswords: They like to
-prove that they did the work, but not giving out the
-``solution''. But this also shows what the problem with such a
-method is: yes, we can hide the secret temporarily, but if
-somebody else wants to verify it, then the secret has to be
-made public. Bob needs to know that \textit{folio} is the
-solution before he can verify the claim of Alice that she had
-the solution first. Similarly with the car-crypto paper: we
-needed to wait until September 2015 when the authors were
-finally able to publish their findings in order to verify the
-hash. Zero-knowledge proofs, in contrast, can be immediately
-checked, even if the secret is not public yet and perhaps
-never will be.
-
-\begin{figure}
-\begin{center}
-\addtolength{\fboxsep}{4mm}
-\fbox{\includegraphics[scale=0.4]{../pics/Dismantling_Megamos_Crypto.png}}
-\end{center}
-\caption{The authors of this paper used a hash in order to prove
-later that they have managed to break into cars.\label{paper}}
-\end{figure}
+\noindent How disgusting! Even worse, Verizon is not known for
+being the cheapest ISP on the planet (completely the
+contrary), and also not known for providing the fastest
+possible speeds, but rather for being among the few ISPs in
+the US with a quasi-monopolistic ``market distribution''.
-\subsubsection*{ZKP: An Illustrative Example}
+Well, we could go on and on\ldots{}and that has not even
+started us yet with all the naughty things NSA \& Friends are
+up to. Why does privacy actually matter? Nobody, I think, has
+a conclusive answer to this question yet. Maybe the following
+four notions help with clarifying the overall picture
+somewhat:
-The idea behind zero-knowledge proofs is not very obvious and
-will surely take some time for you to digest. Therefore let us
-start with a simple illustrative example. The example will not
-be perfect, but hopefully explain the gist of the idea. The
-example is called Alibaba's cave, which graphically looks as
-follows:\footnote{The example is taken from an
-article titled ``How to Explain Zero-Knowledge Protocols
-to Your Children'' available from
-\url{http://pages.cs.wisc.edu/~mkowalcz/628.pdf}.}
+\begin{itemize}
+\item \textbf{Secrecy} is the mechanism used to limit the
+ number of principals with access to information (e.g.,
+ cryptography or access controls). For example I better
+ keep my password secret, otherwise people from the wrong
+ side of the law might impersonate me.
+
+\item \textbf{Confidentiality} is the obligation to protect
+ the secrets of other people or organisations (secrecy
+ for the benefit of an organisation). For example as a
+ staff member at King's I have access to data, even
+ private data, I am allowed to use in my work but not
+ allowed to disclose to anyone else.
-\begin{center}
-\begin{tabular}{c@{\hspace{8mm}}c@{\hspace{8mm}}c}
-\includegraphics[scale=0.1]{../pics/alibaba1.png} &
-\includegraphics[scale=0.1]{../pics/alibaba2.png} &
-\includegraphics[scale=0.1]{../pics/alibaba3.png} \\
-Step 1 & Step 2 & Step 3
-\end{tabular}
-\end{center}
+\item \textbf{Anonymity} is the ability to leave no evidence of
+ an activity (e.g., sharing a secret). This is not equal
+ with privacy---anonymity is required in many
+ circumstances, for example for whistle-blowers,
+ voting, exam marking and so on.
-\noindent Let us take a closer look at the picture in Step 1:
-The cave has a tunnel which forks at some point. Both forks
-are connected in a loop. At the deep end of the loop is a
-magic wand. The point of the magic wand is that Alice knows
-the secret word for how to open it. She wants to keep the word
-secret, but wants to convince Bob that she knows it.
+\item \textbf{Privacy} is the ability or right to protect your
+ personal secrets (secrecy for the benefit of an
+ individual). For example, in a job interview, I might
+ not like to disclose that I am pregnant, if I were a
+ woman, or that I am a father. Lest they might not hire
+ me. Similarly, I might not like to disclose my location
+ data, because thieves might break into my house if they
+ know I am away at work. Privacy is essentially
+ everything which ``shouldn't be anybody's business''.
+
+\end{itemize}
+
+\noindent While this might provide us with some rough
+definitions, the problem with privacy is that it is an
+extremely fine line what should stay private and what should
+not. For example, since I am working in academia, I am every
+so often very happy to be a digital exhibitionist: I am very
+happy to disclose all `trivia' related to my work on my
+personal web-page. This is a kind of bragging that is normal
+in academia (at least in the field of CS), even expected if
+you look for a job. I am even happy that Google maintains a
+profile about all my academic papers and their citations.
-One way of course would be to let Bob follow her, but then he
-would also find out the secret. So this does not work. To find
-a solution, let us first fix the rules: At the beginning Alice
-and Bob are outside the cave. Alice goes in alone and takes
-either tunnel labelled $A$ in the picture, or the other tunnel
-labelled $B$. She waits at the magic wand for the instructions
-from Bob, who also goes into the gave and observes what
-happens at the fork. He has no knowledge which tunnel Alice
-took and calls out (in Step 2) that she should emerge from tunnel
-$A$, for example. If she knows the secret for opening the
-wand, this will not be a problem for Alice. If she was already
-in the $A$-segment of the tunnel, then she just comes back. If
-she was in the $B$-segment of the tunnel she will say the magic
-word and goes through the wand to emerge from $A$ as requested
-by Bob.
+On the other hand I would be very irritated if anybody I do
+not know had a too close look on my private live---it
+shouldn't be anybody's business. The reason is that knowledge
+about my private life can often be used against me. As mentioned
+above, public location data might mean I get robbed. If
+supermarkets build a profile of my shopping habits, they will
+use it to \emph{their} advantage---surely not to \emph{my}
+advantage. Also whatever might be collected about my life will
+always be an incomplete, or even misleading, picture. For
+example I am pretty sure my creditworthiness score was
+temporarily(?) destroyed by not having a regular income in
+this country (before coming to King's I worked in Munich for
+five years). To correct such incomplete or flawed credit
+history data there is, since recently, a law that allows you
+to check what information is held about you for determining
+your creditworthiness. But this concerns only a very small
+part of the data that is held about me/you. Also
+what about cases where data is wrong or outdated (but do we
+need a right-to be forgotten).
+
+
+To see how private matter can lead really to the wrong
+conclusions, take the example of Stephen Hawking: When he was
+diagnosed with his disease, he was given a life expectancy of
+two years. If employers would know about such problems, would
+they have employed Hawking? Now, he is enjoying his 70+
+birthday. Clearly personal medical data needs to stay private.
+
+To cut a long story short, I let you ponder about the two
+statements which are often voiced in discussions about privacy:
-Let us have a look at the case where Alice cheats, that is not
-knows the secret. She would still go into the cave and use,
-for example the $B$-segment of the tunnel. If now Bob says she
-should emerge from $B$, she is lucky. But if he says she
-should emerge from $A$ then Alice is in trouble: Bob will find
-out she does not actually know the secret. So in order to fool
-Bob she needs to anticipate his call, and already go into the
-corresponding tunnel. This of course also does not work, since
-Bob can make any call he wants. Consequently in order to find
-out whether Alice cheats, Bob just needs to repeat this
-protocol many times. Each time Alice has a chance of
-$\frac{1}{2}$ to be lucky or being found out. Iterating this
-$n$ times means she must be right every time and when
-cheating: the probability for this is $\frac{1}{2}^n$, number
-that for already relatively small $n$, say 10, is incredibly
-small.
+\begin{itemize}
+\item \textit{``You have zero privacy anyway. Get over
+it.''}\\
+\mbox{}\hfill{}{\small{}(by Scott Mcnealy, former CEO of Sun)}
+
+\item \textit{``If you have nothing to hide, you have nothing
+to fear.''}
+\end{itemize}
+
+\noindent If you like to watch a movie which has this topic as
+its main focus I recommend \emph{Gattaca} from
+1997.\footnote{\url{http://www.imdb.com/title/tt0119177/}} If
+you want to read up on this topic, I can recommend the
+following article that appeared in 2011 in the Chronicle of
+Higher Education:
+
+\begin{center}
+\url{http://chronicle.com/article/Why-Privacy-Matters-Even-if/127461/}
+\end{center}
+
+\noindent Funnily, or maybe not so funnily, the author of this
+article carefully tries to construct an argument that does not
+only attack the nothing-to-hide statement in cases where
+governments \& co collect people's deepest secrets, or
+pictures of people's naked bodies, but an argument that
+applies also in cases where governments ``only'' collect data
+relevant to, say, preventing terrorism. The fun is of course
+that in 2011 we could just not imagine that respected
+governments would do such infantile things as intercepting
+people's nude photos. Well, since Snowden we know some people
+at the NSA did exactly that and then shared such photos among
+colleagues as ``fringe benefit''.
-There are some interesting observations we can make about
-Alibaba's cave and the ZKP protocol between Alice and Bob:
+\subsubsection*{Re-Identification Attacks}
+
+Apart from philosophical musings, there are fortunately also
+some real technical problems with privacy. The problem I want
+to focus on in this handout is how to safely disclose datasets
+containing potentially very private data, say health records.
+What can go wrong with such disclosures can be illustrated
+with four well-known examples:
\begin{itemize}
-\item {\bf Completeness} If Alice knows the secret, Bob
- accepts Alice ``proof'' for sure. There is no error
- possible in that Bob thinks Alice cheats when she
- actually knows the secret.
-\item {\bf Soundness} If Alice does not know the secret,
- Bob accepts her ``proof'' with a very small probability.
- If, as in the example above, the probability of being
- able to hide cheating is $\frac{1}{2}$ in each round
- it will be $\frac{1}{2}^n$ after $n$-rounds, which even
- for small $n$ say $> 10$ is very small indeed.
-\item {\bf Zero-Knowledge} Even if Bob accepts
- the proof by Alice, he cannot convince anybody else.
-\end{itemize}
+\item In 2006, a then young company called Netflix offered a 1
+ Mio \$ prize to anybody who could improve their movie
+ rating algorithm. For this they disclosed a dataset
+ containing 10\% of all Netflix users at the time
+ (appr.~500K). They removed names, but included numerical
+ ratings of movies as well as times when ratings were
+ uploaded. Though some information was perturbed (i.e.,
+ slightly modified).
+
+ Two researchers had a closer look at this anonymised
+ data and compared it with public data available from the
+ International Movie Database (IMDb). They found that
+ 98\% of the entries could be re-identified in the
+ Netflix dataset: either by their ratings or by the dates
+ the ratings were uploaded. The result was a class-action
+ suit against Netflix, which was only recently resolved
+ involving a lot of money.
-\noindent The last property is the most interesting one.
-Assume Alice has convinced Bob that she knows the secret and
-Bob filmed the whole protocol with a camera. Can he use the
-video to convince anybody else? After a moment of thought, you
-will agree that this is not the case. Alice and Bob might have
-just made it all up and colluded by Bob telling Alice
-beforehand which tunnel he will call. In this way it appears
-as if all iterations of the protocol were successful, but they
-prove nothing. If another person wants to find out whether
-Alice knows the secret, he or she would have to conduct the
-protocol again. This is actually the formal definition of a
-zero-knowledge proof: an independent observer cannot
-distinguish between a real protocol (where Alice knows the
-secret) and a fake one (where Bob and Alice colluded).
+\item In the 1990ies, medical datasets were often made public
+ for research purposes. This was done in anonymised form
+ with names removed, but birth dates, gender and ZIP-code
+ were retained. In one case where such data about
+ hospital visits of state employees in Massachusetts was
+ made public, the then governor assured the public that
+ the released dataset protected patient privacy by
+ deleting identifiers.
+
+ A graduate student could not resist cross-referencing
+ public voter data with the released data that still
+ included birth dates, gender and ZIP-code. The result
+ was that she could send the governor his own hospital
+ record. It turns out that birth dates, gender and
+ ZIP-code uniquely identify 87\% of people in the US.
+ This work resulted in a number of laws prescribing which
+ private data cannot be released in such datasets.
+
+\item In 2006, AOL published 20 million Web search queries
+ collected from 650,000 users (names had been deleted).
+ This was again done for research purposes. However,
+ within days an old lady, Thelma Arnold, from Lilburn,
+ Georgia, (11,596 inhabitants) was identified as user
+ No.~4417749 in this dataset. It turned out that search
+ engine queries are deep windows into people's private
+ lives.
+
+\item Genome-Wide Association Studies (GWAS) was a public
+ database of gene-frequency studies linked to diseases.
+ It would essentially record that people who have a
+ disease, say diabetes, have also certain genes. In order
+ to maintain privacy, the dataset would only include
+ aggregate information. In case of DNA data this
+ aggregation was achieved by mixing the DNA of many
+ individuals (having a disease) into a single solution.
+ Then this mixture was sequenced and included in the
+ dataset. The idea was that the aggregate information
+ would still be helpful to researchers, but would protect
+ the DNA data of individuals.
+
+ In 2007 a forensic computer scientist showed that
+ individuals can still be identified. For this he used
+ the DNA data from a comparison group (people from the
+ general public) and ``subtracted'' this data from the
+ published data. He was left with data that included all
+ ``special'' DNA-markers of the individuals present in
+ the original mixture. He essentially deleted the
+ ``background noise'' in the published data. The problem
+ with DNA data is that it is of such a high resolution
+ that even if the mixture contained maybe 100
+ individuals, you can with current technology detect
+ whether an individual was included in the mixture or
+ not.
+
+ This result changed completely how DNA data is nowadays
+ published for research purposes. After the success of
+ the human-genome project with a very open culture of
+ exchanging data, it became much more difficult to
+ anonymise data so that patient's privacy is preserved.
+ The public GWAS database was taken offline in 2008.
+
+\end{itemize}
+
+\noindent There are many lessons that can be learned from
+these examples. One is that when making datasets public in
+anonymised form, you want to achieve \emph{forward privacy}.
+This means, no matter what other data that is also available
+or will be released later, the data in the original dataset
+does not compromise an individual's privacy. This principle
+was violated by the availability of ``outside data'' in the
+Netflix and governor of Massachusetts cases. The additional
+data permitted a re-identification of individuals in the
+dataset. In case of GWAS a new technique of re-identification
+compromised the privacy of people in the dataset. The case of
+the AOL dataset shows clearly how incomplete such data can be:
+Although the queries uniquely identified the older lady, she
+also looked up diseases that her friends had, which had
+nothing to do with her. Any rational analysis of her query
+data must therefore have concluded, the lady is on her
+death bed, while she was actually very much alive and kicking.
+
+In 2016, Yahoo released the so far largest machine learning
+dataset to the research community. It includes approximately
+13.5 TByte of data representing around 100 Billion events from
+anonymized user-news items, collected by recording
+interactions of about 20M users from February 2015 to May
+2015. Yahoo's gracious goal is to promote independent research
+in the fields of large-scale machine learning and recommender
+systems. It remains to be seen whether this data will really
+only be used for that purpose.
+
+\subsubsection*{Differential Privacy}
+
+Differential privacy is one of the few methods that tries to
+achieve forward privacy. The basic idea is to add appropriate
+noise, or errors, to any query of the dataset. The intention
+is to make the result of a query insensitive to individual
+entries in the database. That means the results are
+approximately the same no matter if a particular individual is
+in the dataset or not. The hope is that the added error does
+not eliminate the ``signal'' one is looking for in the
+dataset.
+
+%\begin{center}
+%User\;\;\;\;
+%\begin{tabular}{c}
+%tell me $f(x)$ $\Rightarrow$\\
+%$\Leftarrow$ $f(x) + \text{noise}$
+%\end{tabular}
+%\;\;\;\;\begin{tabular}{@{}c}
+%Database\\
+%$x_1, \ldots, x_n$
+%\end{tabular}
+%\end{center}
+%
+%\begin{center}
+%\begin{tabular}{l|l}
+%Staff & Salary\\\hline
+%$PM$ & \pounds{107}\\
+%$PF$ & \pounds{102}\\
+%$LM_1$ & \pounds{101}\\
+%$LF_2$ & \pounds{97}\\
+%$LM_3$ & \pounds{100}\\
+%$LM_4$ & \pounds{99}\\
+%$LF_5$ & \pounds{98}
+%\end{tabular}
+%\end{center}
+%
+%
+%\begin{center}
+%\begin{tikzpicture}
+%\begin{axis}[symbolic y coords={salary},
+% ytick=data,
+% height=3cm]
%\addplot+[jump mark mid] coordinates
%{(0,salary) (0.1,salary)
% (0.4,salary) (0.5,salary)
% (0.8,salary) (0.9,salary)};
%\end{axis}
%\end{tikzpicture}
+%\end{center}
+%
+%\begin{tikzpicture}[outline/.style={draw=#1,fill=#1!20}]
% \node [outline=red] {red box};
% \node [outline=blue] at (0,-1) {blue box};
%\end{tikzpicture}
+
+\ldots
-\subsubsection*{Using an Graph-Isomorphism Problem for ZKPs}
-
-Now the question is how can we translate Alibaba's cave into a
-computer science solution? It turns out we need an NP problem
-for that. The main feature of an NP problem is that it is
-computational very hard to generate a solution, but it is very
-easy to check whether a given solution indeed solves the
-problem at hand.\footnote{The question whether $P = NP$ or not,
-we leave to others to speculate about.}
-
-One NP problem is the \emph{graph isomorphism problem}. It
-essentially determines whether the following two graphs, say
-$G_1$ and $G_2$, can be moved and stretched so that they look
-exactly the same.
+\subsubsection*{Further Reading}
-\begin{center}
-\begin{tabular}{c@{\hspace{8mm}}c@{\hspace{8mm}}c}
-$G_1$ & $G_2$\\
-\raisebox{-13mm}{\includegraphics[scale=0.4]{../pics/simple.png}} &
-\raisebox{-13mm}{\includegraphics[scale=0.4]{../pics/simple-b.png}}&
+Two cool articles about how somebody obtained via the Freedom
+of Information Law the taxicab dataset of New York and someone
+else showed how easy it is to mine for private information:
-\footnotesize
-\begin{tabular}{rl}
-Graph $G_1$ & Graph $G_2$\\
-a & 1\\
-b & 6\\
-c & 8\\
-d & 3\\
-g & 5\\
-h & 2\\
-i & 4\\
-j & 7\\
-\end{tabular}
+\begin{center}\small
+\begin{tabular}{p{0.78\textwidth}}
+\url{http://chriswhong.com/open-data/foil_nyc_taxi/}\smallskip\\
+\url{http://research.neustar.biz/2014/09/15/riding-with-the-stars-passenger-privacy-in-the-nyc-taxicab-dataset}
\end{tabular}
\end{center}
-\noindent The table on the right gives a mapping of the nodes
-of the first graph to the nodes of the second. With this
-mapping we can check: node $a$ is connected in $G_1$ with $g$,
-$h$ and $i$. Node $a$ maps to node $1$ in $G_2$, which is
-connected to $2$, $4$ and $5$, which again correspond via the
-mapping to $h$, $i$ and $g$ respectively. Let us write
-$\sigma$ for such a table and let us write
-
-\[G_1 = \sigma(G_2)\]
-
-\noindent for two isomorphic graphs ($\sigma$ being the
-isomorphism). It is actually very easy to construct two
-isomorphic graphs: Start with an arbitrary graph, re-label the
-nodes consistently. Alice will need to do this frequently
-for the protocol below. In order to be useful, we therefore
-would need to consider substantially larger graphs, like
-with thousand nodes.
-
-Now the secret which Alice tries to hide is the knowledge of
-such an isomorphism $\sigma$ between two such graphs. But she
-can convince Bob whether she knows such an isomorphism for two
-graphs, say $G_1$ and $G_2$, using the same idea as in the
-example with Alibaba's cave. For this Alice and Bob must
-follow the following protocol:
-
-\begin{enumerate}
-\item Alice generates an isomorphic graph $H$ which she sends
- to Bob (in each iteration she needs to generate a
- different $H$).
-\item
- After receiving $H$, Bob asks Alice either for an
- isomorphism between $G_1$ and $H$, or $G_2$ and $H$.
-\item Alice and Bob repeat this procedure $n$ times.
-\end{enumerate}
-
-\noindent In Step 1 it is important that Alice always
-generates a fresh isomorphic graph. I let you think what
-would happen if Alice sends out twice the same graph $H$.
-
-As said before, this is relatively easy to generate by
-consistently relabelling nodes. If she started from $G_1$,
-Alice will have generated
-
-\begin{equation}
-H = \sigma'(G_1)\label{hiso}
-\end{equation}
-
-\noindent where $\sigma'$ is the isomorphism between $H$ and
-$G_1$. But she could equally have started from $G_2$. In the
-case where $G_1$ and $G_2$ are isomorphic, if $H$ is
-isomorphic with $G_1$, it will also be isomorphic with
-$G_2$, and vice versa.
-
-After generating $H$, Alice sends it to Bob. The important
-point is that she needs to ``commit'' to this $H$, therefore
-this kind of zero-knowledge protocols are called
-\emph{commitment protocols}. Only after receiving $H$, Bob
-will make up his mind about which isomorphism he asks
-for---whether between $H$ and $G_1$ or $H$ and $G_2$. For this
-he could flip a coin, since the choice should be as
-unpredictable for Alice as possible. Once Alice receives the
-request, she has to produce an isomorphism. If she generated
-$H$ as shown in \eqref{hiso} and is asked for an isomorphism
-between $H$ and $G_1$, she just sends $\sigma'$. If she had
-been asked for an isomorphism between $H$ and $G_2$, she just
-has to compose her secret isomorphism $\sigma$ and $\sigma'$.
-The main point for the protocol is that even knowing the
-isomorphism between $H$ and $G_1$ or $H$ and $G_2$, will not
-make the task easier to find the isomorphism between $G_1$ and
-$G_2$, which is the secret Alice tries to protect.
-
-In order to make it crystal clear how this protocol proceeds,
-let us give a version using our more formal notation for
-protocols:
+\noindent
+A readable article about how supermarkets mine your shopping
+habits (especially how they prey on new exhausted parents
+;o) appeared in 2012 in the New York Times:
\begin{center}
-\begin{tabular}{lrl}
-0) & $A \to B:$ & $G_1$ and $G_2$\\
-1a) & $A \to B:$ & $H_1$ \\
-1b) & $B \to A:$ & produce isomorphism $G_1 \leftrightarrow H_1$?
- \quad(or $G_2 \leftrightarrow H_1$)\\
-1c) & $A \to B:$ & requested isomorphism\\
-2a) & $A \to B:$ & $H_2$\\
-2b) & $B \to A:$ & produce isomorphism $G_1 \leftrightarrow H_2$?
- \quad(or $G_2 \leftrightarrow H_2$)\\
-2c) & $A \to B:$ & requested isomorphism\\
- & \ldots\\
-\end{tabular}
+\url{http://www.nytimes.com/2012/02/19/magazine/shopping-habits.html}
\end{center}
-\noindent As can be seen the protocol runs for some
-agreed number of iterations. The $H_i$ Alice needs to
-produce, need to be all distinct. I hope you now know
-why?
-
-It is also crucial that in each iteration, Alice first sends
-$H_i$ and then Bob can decide which isomorphism he wants:
-either $G_1 \leftrightarrow H_i$ or $G_2 \leftrightarrow H_i$.
-If somehow Alice can find out before she committed to $H_i$,
-she can cheat. For this assume Alice does \emph{not} know an
-isomorphism between $G_1$ and $G_2$. If she knows which
-isomorphism Bob will ask for she can craft $H$ in such a way
-that it is isomorphism with either $G_1$ or $G_2$ (but it
-cannot with both). Then in each case she would send Bob
-a correct answer and he would come to the conclusion that
-all is well. I let you also answer the question whether
-such a protocol run between Alice and Bob would convince
-a third person, say Pete.
-
-Since the interactive nature of the above PKZ protocol and the
-correct ordering of the messages is so important for the
-``correctness'' of the protocol, it might look surprising that
-the same goal can also me achieved in a completely offline
-manner. By this I mean Alice can publish all data at once, and
-then at a later time, Bob can inspect the data and come to the
-conclusion whether or not Alice knows the secret (again
-without actually learning about the secret). For this
-Alice has to do the following:
+\noindent An article that analyses privacy and shopping habits
+from a more economic point of view is available from:
-\begin{enumerate}
-\item Alice generates $n$ isomorphic graphs
- $H_{1..n}$ (they need to be all distinct)
-\item she feeds the $H_{1..n}$ into a hashing function
- (for example encoded as as matrix)
-\item she takes the first $n$ bits of the output of the hashing
- function:
- whenever the output is $0$, she shows an isomorphism
- with $G_1$; for $1$ she shows an isomorphism
- with $G_2$
-\end{enumerate}
+\begin{center}
+\url{http://www.dtc.umn.edu/~odlyzko/doc/privacy.economics.pdf}
+\end{center}
-\noindent The reason why this works and achieves the same
-goal as the interactive variant is that Alice has no
-control over the hashing function. It would be
-computationally just too hard to assemble a set of
-$H_{1..n}$ such that she can force where $0$s and $1$s
-in the hash values are such that it would pass an external
-test. The point is that Alice can publish all this data
-on the comfort of her own web-page, for example, and
-in this way convince everybody who bothers to check.
-
-The virtue of the use of graphs and isomorphism for a
-zero-knowledge protocol is that the idea why it works
-are relatively straightforward. The disadvantage is
-that encoding any secret into a graph-isomorphism, while
-possible, is awkward. The good news is that in fact
-any NP problem can be used as part of a ZKP protocol.
-
-
-\subsubsection*{Using Modular Logarithms for ZKP Protocols}
-
-While information can be encoded into graph isomorphisms, it
-is not the most convenient carrier of information. Clearly it
-is much easier to encode information into numbers. Let us look
-at zero-knowledge proofs that use numbers as secrets. For this
-the underlying NP-problem is to calculate discrete logarithms.
-It can be used by choosing public numbers $A$, $B$, $p$, and
-private $x$ such that
+\noindent An attempt to untangle the web of current technology
+for spying on consumers is published in:
\begin{center}
-$A^x \equiv B\; mod\; p$
-\end{center}
-
-\noindent holds. The secret Alice tries to keep secret is $x$.
-The point of the modular logarithm is that it is very hard
-from the public data to calculate $x$ (for large primes).
-Now the protocol proceeds in three stages:
-
-\begin{itemize}
-\item {\bf Commitment Stage}
- \begin{enumerate}
- \item Alice generates $z$ random numbers $r_1, \ldots, r_z$,
- all less than $p - 1$. Alice then sends Bob for all
- $i = 1,\ldots, z$:
- \[ h_i = A^{r_i}\; mod\; p\]
- \item Bob generates $z$ random bits, say $b_1,\ldots, b_z$. He can do this
- by flipping $z$ times a coin, for example.
- \item For each bit $b_i$, Alice sends Bob an $s_i$ where
-
- \begin{center}
- \begin{tabular}{ll}
- if $b_i = 0$: & $s_i = r_i$\\
- if $b_i = 1$: & $s_i = (r_i - r_j) \;mod\; (p -1)$\\
- \end{tabular}
- \end{center}
+\url{http://cyberlaw.stanford.edu/files/publication/files/trackingsurvey12.pdf}
+\end{center}
- where $r_j$ is the lowest $j$ where $b_j = 1$.
-
- \end{enumerate}
- \end{itemize}
-
-\noindent For understanding the last step, let $z$ be just 4.
-We have four random values $r_i$ chosen by Alice and four
-random bits $b_i$ chosen subsequently by Bob, for example
-
- \begin{center}
- \begin{tabular}{lcccc}
- $r_i$:\; & 4 & 9 & 1 & 3\\
- $b_i$:\; & 0 & 1 & 0 & 1\\
- & & $\uparrow$ \\
- & & $j$
- \end{tabular}
- \end{center}
-
- \noindent The highlighted column is the lowest where $b_i =
- 1$ (counted from the left). That means $r_j = 9$. The reason
- for letting Alice choose the random numbers $r_1, \ldots, r_z$
- will become clear shortly. Next is the confirmation
- phase where Bob essentially checks whether Alice has sent
- him ``correct'' $s_i$ and $h_i$.
-
-\begin{itemize}
-\item {\bf Confirmation Stage}
- \begin{enumerate}
- \item For each $b_i$ Bob checks whether $s_i$
- conform to the protocol
-
- \begin{center}
- \begin{tabular}{ll}
- if $b_i = 0$: & $A^{s_i} \stackrel{?}{\equiv} h_i\;mod\;p$\\
- if $b_i = 1$: & $A^{s_i} \stackrel{?}{\equiv} h_i * h_j^{-1} \;mod\; p$\\
- \end{tabular}
- \end{center}
- \end{enumerate}
-\end{itemize}
-
-\noindent To understand the case for $b_i = 1$, you have
-to do the following calculation:
+\noindent An article that sheds light on the paradox that
+people usually worry about privacy invasions of little
+significance, and overlook the privacy invasion that might
+cause significant damage:
\begin{center}
-\begin{tabular}{r@{\hspace{1mm}}c@{\hspace{1mm}}l}
-$A^{s_i}$ & $=$ & $A^{r_i - r_j}$\\
- & $=$ & $A^{r_i} * A^{-r_j}$\\
- & $=$ & $h_{r_i} * h_{r_j}^{-1}\;mod\;p$
-\end{tabular}
-\end{center}
-
-\noindent What is interesting that so far nothing has been
-sent about $x$, which is the secret Alice has. Also notice
-that Bob does not know $r_j$. He received
-
-\begin{center}
-$r_j - r_j$, $r_m - r_j$, \ldots, $r_p - r_j \;mod \;p - 1$
-\end{center}
-
-\noindent whenever his corresponding bits were $1$. So Bob
-does not know $r_j$ and also does not know any $r_i$ where the
-bit was $1$. Information about the $x$ is sent in the next
-stage (obviously not revealing $x$).
-
-\begin{itemize}
-\item {\bf Proving Stage}
-
-\begin{enumerate}
-\item Alice proves she knows $x$, the discrete log of $B$,
-by sending
-
-\begin{center}
-$s_{z+1} = x - r_j\;mod\;p-1$
+\url{http://www.heinz.cmu.edu/~acquisti/papers/Acquisti-Grossklags-Chapter-Etrics.pdf}
\end{center}
-\item Bob confirms
+
+Interesting ideas
\begin{center}
-$A^{s_{z+1}} \stackrel{?}{\equiv} B * h_j^{-1} \;mod \; p$
+\url{https://adnauseam.io}
\end{center}
-\end{enumerate}
-\end{itemize}
-
-\noindent To understand the last step, you have to do the trick
-again that
-
-\[A^{s_{z+1}} = A^{x-r_j} = \ldots
-\]
-
-\noindent
-which I leave to you.
-
-Now the question is how can Alice cheat? In order to cheat she
-has to coordinate what she sends as $h_i$ in step 1 and $s_i$
-in step 3 of the commitment stage, and also what to send as
-$s_{z+1}$ in the proving stage. For the latter of course
-Alice does not know $x$, so she just chooses some random
-number for $s_{z+1}$ and calculates
-
-\[A^{s_{z+1}}\]
\noindent
-and then solves the equation
-
-\[A^{s_{z+1}} \equiv B * y \;mod\;p\]
-
-\noindent for $y$. This is easy since no logarithm needs to be
-computed. If Alice can guess the $j$ where the first 1 will
-appear in Bob's bit vector, then she sends the inverse of $y$
-as $h_j$ and 0 as $s_j$. However, notice that when she
-calculates a solution for $y$ she does not know $r_j$. For this she
-would need to calculate the modular logarithm
-
-\[y \equiv A^{r_j}\;mod\;p\]
-
-\noindent which is hard (see step 1 in the commitment stage).
-
-Having settled on what $h_j$ should be, now what should Alice
-send as the other $h_i$ and other $s_i$? If the $b_i$ happens
-to be a 1, then the $h_i$ and other $s_i$ need to satisfy the
-test
-
-\[A^{s_i} \stackrel{?}{\equiv} h_i * h_j^{-1} \;mod\; p\]
-
-\noindent where she has already settled on the value of
-$h_j^{-1}$. Lets say she choses $s_i$ at random, then she just
-needs to solve
-
-\[A^{s_i} \equiv z * h_j^{-1} \;mod\; p\]
-
-\noindent for $z$. Again that is easy, but it does not allow
-us to know $r_i$, because then we would again need to solve
-a modular logarithm problem. Let us call an $h_i$ which was
-solved the easy way as \emph{bogus}. Alice has to produce
-bogus $h_i$ for all bits that are going to be $1$ in advance!
-This means she has to guess all the bits correctly. (Yes?
-I let you think about this.)
-
-Let us see what happens if she guesses wrongly: Suppose the
-bit $b_i = 1$ where she thought she will get a 0. Then she has
-already sent an $h_i$ and $h_j$ and now must find an $s_i$
-such that
-
-\[A^{s_i} \equiv h_i * h_j^{-1} \;mod\; p\]
-
-\noindent holds. For this remember in calculating $h_i$, she
-just chose a random $s_i$. Now she has to send a genuine one.
-But this is of course too hard. If she knew the genuine $r_i$
-and $r_j$ for $h_i$ and $h_j$, it would be easy (in this case
-$s_i = r_i - r_j$). But she does not. So she will be found
-out. If $b_i = 0$, but she thought she will get a 1, then
-she has to send a $s_i$ which satisfies
-
-\[A^{s_i} \equiv h_i\;mod\;p\]
-
-\noindent Again she does not know $r_i$. So it is a too hard
-task and she will be found out again.
-
-To sum up, in order for Alice to successfully cheat Bob, she
-needs to guess \emph{all} bits correctly. She has only a
-$\frac{1}{2^z}$ chance of doing this.
-
-\subsubsection*{Further Reading}
-
-Make sure you understand what NP problems
-are.\footnote{\url{http://en.wikipedia.org/wiki/NP_(complexity)}}
-They are the building blocks for zero-knowledge proofs.
-Zero-Knowldege proofs are not yet widely used in production
-systems, but it is slowly gaining ground. One area of application
-where they pop up is crypto currencies (for example Zerocoins
-or how to make sure a Bitcoin exchange is solvent without
-revealing its assets).
-
-If you want to brush up on the modular logarithm problem,
-the Khan Academy has a nice video:
+And a paper that predicts ad-blockers will in the end win over anti-ad-blocking:
\begin{center}
-\url{https://www.khanacademy.org/video/discrete-logarithm-problem}
+\url{http://randomwalker.info/publications/ad-blocking-framework-techniques.pdf}
\end{center}
+
\end{document}
-http://blog.cryptographyengineering.com/2014/11/zero-knowledge-proofs-illustrated-primer.html
-
-http://btravers.weebly.com/uploads/6/7/2/9/6729909/zero_knowledge_technique.pdf
-http://zk-ssh.cms.ac/docs/Zero_Knowledge_Prinzipien.pdf
-http://www.wisdom.weizmann.ac.il/~oded/PS/zk-tut02v4.ps
-
-socialist millionares problem
-http://en.wikipedia.org/wiki/Socialist_millionaire
-http://twistedoakstudios.com/blog/Post3724_explain-it-like-im-five-the-socialist-millionaire-problem-and-secure-multi-party-computation
-
+http://randomwalker.info/teaching/fall-2012-privacy-technologies/?
+http://chronicle.com/article/Why-Privacy-Matters-Even-if/127461/
+http://repository.cmu.edu/cgi/viewcontent.cgi?article=1077&context=hcii
+https://josephhall.org/papers/NYU-MCC-1303-S2012_privacy_syllabus.pdf
+http://www.jetlaw.org/wp-content/uploads/2014/06/Bambauer_Final.pdf
+http://www.cs.cmu.edu/~yuxiangw/docs/Differential%20Privacy.pdf
+https://www.youtube.com/watch?v=Gx13lgEudtU
+https://www.cs.purdue.edu/homes/ctask/pdfs/CERIAS_Presentation.pdf
+http://www.futureofprivacy.org/wp-content/uploads/Differential-Privacy-as-a-Response-to-the-Reidentification-Threat-Klinefelter-and-Chin.pdf
+http://www.cis.upenn.edu/~aaroth/courses/slides/Overview.pdf
+http://www.cl.cam.ac.uk/~sjm217/papers/tor14design.pdf
%%% Local Variables:
%%% mode: latex
Binary file handouts/ho07.pdf has changed
--- a/handouts/ho07.tex Sat Sep 23 19:39:53 2017 +0100
+++ b/handouts/ho07.tex Sat Sep 23 19:52:27 2017 +0100
@@ -1,561 +1,1011 @@
\documentclass{article}
\usepackage{../style}
\usepackage{../graphics}
-
-\begin{document}
-\fnote{\copyright{} Christian Urban, King's College London, 2014, 2015}
+\usepackage{../langs}
+\usepackage{../data}
-%https://www.theguardian.com/technology/2016/oct/04/yahoo-secret-email-program-nsa-fbi
-%https://nakedsecurity.sophos.com/2015/11/12/california-collects-owns-and-sells-infants-dna-samples/
-%http://randomwalker.info/teaching/fall-2012-privacy-technologies/?
-%https://josephhall.org/papers/NYU-MCC-1303-S2012_privacy_syllabus.pdf
-%http://www.jetlaw.org/wp-content/uploads/2014/06/Bambauer_Final.pdf
-%http://www.cs.cmu.edu/~yuxiangw/docs/Differential%20Privacy.pdf
-%https://www.youtube.com/watch?v=Gx13lgEudtU
-%https://fpf.org/wp-content/uploads/Differential-Privacy-as-a-Response-to-the-Reidentification-Threat-Klinefelter-and-Chin.pdf
-%http://research.neustar.biz/2014/09/08/differential-privacy-the-basics/
+%https://crypto.stanford.edu/cs251/
+%https://programmingblockchain.gitbooks.io/programmingblockchain/content/
-%=====
-%Tim Greene, Network World, 17 Dec 2015 (via ACM TechNews, 18 Dec 2015)
-%
-%Massachusetts Institute of Technology (MIT) researchers' experimental
-%Vuvuzela messaging system offers more privacy than The Onion Router (Tor) by
-%rendering text messages sent through it untraceable. MIT Ph.D. student
-%David Lazar says Vuvuzela resists traffic analysis attacks, while Tor
-%cannot. The researchers say the system functions no matter how many parties
-%are using it to communicate, and it employs encryption and a set of servers
-%to conceal whether or not parties are participating in text-based dialogues.
-%"Vuvuzela prevents an adversary from learning which pairs of users are
-%communicating, as long as just one out of [the] servers is not compromised,
-%even for users who continue to use Vuvuzela for years," they note. Vuvuzela
-%can support millions of users hosted on commodity servers deployed by a
-%single group of users. Instead of anonymizing users, Vuvuzela prevents
-%outside observers from differentiating between people sending messages,
-%receiving messages, or neither, according to Lazar. The system imposes
-%noise on the client-server traffic which cannot be distinguished from actual
-%messages, and all communications are triple-wrapped in encryption by three
-%servers. "Vuvuzela guarantees privacy as long as one of the servers is
-%uncompromised, so using more servers increases security at the cost of
-%increased message latency," Lazar notes.
-%http://orange.hosting.lsoft.com/trk/click?ref=znwrbbrs9_5-e70bx2d991x066779&
+% bug in smart contracts
+% https://www.benthamsgaze.org/2016/06/17/smart-contracts-beyond-the-age-of-innocence/
+% http://hackingdistributed.com/2016/06/18/analysis-of-the-dao-exploit/
-%%%%
-%% canvas tracking
-%%https://freedom-to-tinker.com/blog/englehardt/the-princeton-web-census-a-1-million-site-measurement-and-analysis-of-web-privacy/
+% hard forks
+% https://github.com/bitcoin/bips/blob/master/bip-0050.mediawiki
-%%%
-%% cupit re-identification attack
-%% https://nakedsecurity.sophos.com/2016/05/20/published-personal-data-on-70000-okcupid-users-taken-down-after-dmca-order/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+nakedsecurity+%28Naked+Security+-+Sophos%29
-
-%Differential privacy
-%=====================
-%https://www.wired.com/2016/06/apples-differential-privacy-collecting-data/
+% only 25% needed to obtain larger shares of mining
+% http://www.cs.cornell.edu/~ie53/publications/btcProcFC.pdf
-%Differential privacy, translated from Apple-speak, is the
-%statistical science of trying to learn as much as possible
-%about a group while learning as little as possible about any
-%individual in it.
+% re-identification attacks
+% https://cseweb.ucsd.edu/~smeiklejohn/files/imc13.pdf
-%As Roth notes when he refers to a “mathematical proof,”
-%differential privacy doesn’t merely try to obfuscate or
-%“anonymize” users’ data. That anonymization approach, he
-%argues, tends to fail. In 2007, for instance, Netflix released
-%a large collection of its viewers’ film ratings as part of a
-%competition to optimize its recommendations, removing people’s
-%names and other identifying details and publishing only their
-%Netflix ratings. But researchers soon cross-referenced the
-%Netflix data with public review data on IMDB to match up
-%similar patterns of recommendations between the sites and add
-%names back into Netflix’s supposedly anonymous database.
+% bit-coin papers
+% https://crypto.stanford.edu/cs251/syllabus.html
+
+% bit coin talk --- at 20:00 mins
+%https://www.usenix.org/conference/lisa16/conference-program/presentation/perlman
-%As an example of that last method, Microsoft’s Dwork points to
-%the technique in which a survey asks if the respondent has
-%ever, say, broken a law. But first, the survey asks them to
-%flip a coin. If the result is tails, they should answer
-%honestly. If the result is heads, they’re instructed to flip
-%the coin again and then answer “yes” for heads or “no” for
-%tails. The resulting random noise can be subtracted from the
-%results with a bit of algebra, and every respondent is
-%protected from punishment if they admitted to lawbreaking.
-
-%https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf
-
-% Windows 10 data send back to Microsoft (Cortana)
-%Here’s a non-exhaustive list of data sent back: location data, text
-%input, voice input, touch input, webpages you visit, and telemetry
-%data regarding your general usage of your computer, including which
-%programs you run and for how long.
-
-% Businesses are already using customised pricing online based on
-% information they can glean about you. It is hard to know how
-% widespread the practice is; companies keep their pricing strategies
-% closely guarded and are wary of the bad PR price discrimination
-% could pose. However, it is clear that a number of large retailers
-% are experimenting with it. Staples, for example, has offered
-% discounted prices based on whether rival stores are within 20 miles
-% of its customers’ location. Office Depot has admitted to using its
-% customers’ browsing history and location to vary its range of offers
-% and products. A 2014 study from Northeastern University found
-% evidence of “steering” or differential pricing at four out of 10
-% general merchandise websites and five out of five travel
-% websites. (Steering is when a company doesn’t give you a customised
-% price, but points you towards more expensive options if it thinks
-% you will pay more.) The online travel company Orbitz raised
-% headlines in 2012 when it emerged that the firm was pointing Mac
-% users towards higher-priced hotel rooms than PC users.
+% In fact, far from freeing people from the oppression of the state,
+% blockchains perversely promise the perfect tool for a fully
+% auditable, tax compliant, cashless society. Similarly, the belief it
+% is an anonymous digital cash has quickly vanished and we are now
+% seeing a large number of analytics companies, set-up specifically to
+% work with law enforcement agencies, to police this new parallel
+% financial system.
+%
+% But today blockchain is riddled with
+% contradictions and misunderstandings. Most of its problems are very
+% fixable, if you want to fix them
-%%% government will overwrite your wishes if it is annoymous
-%% https://www.lightbluetouchpaper.org/2016/12/05/government-u-turn-on-health-privacy/
-
-%% corporate surveilance / privacy - report and CC3C talk
-%% http://crackedlabs.org/en/networksofcontrol
-%% https://media.ccc.de/v/33c3-8414-corporate_surveillance_digital_tracking_big_data_privacy#video&t=2933
-
-\section*{Handout 6 (Privacy)}
+% history of bitcoins
+% https://futurism.com/images/this-week-in-tech-jan-15-22-2016/
-The first motor car was invented around 1886. For ten years,
-until 1896, the law in the UK (and elsewhere) required a
-person to walk in front of any moving car waving a red flag.
-Cars were such a novelty that most people did not know what to
-make of them. The person with the red flag was intended to
-warn the public, for example horse owners, about the impending
-novelty---a car. In my humble opinion, we are at the same
-stage of development with privacy. Nobody really knows what it
-is about or what it is good for. All seems very hazy. There
-are a few laws (e.g.~cookie law, right-to-be-forgotten law)
-which address problems with privacy, but even if they are well
-intentioned, they either back-fire or are already obsolete
-because of newer technologies. The result is that the world of
-``privacy'' looks a little bit like the old Wild
-West---lawless and mythical.
+\begin{document}
+\fnote{\copyright{} Christian Urban, 2014, 2015}
-We would have hoped that after Snowden, Western governments
-would be a bit more sensitive and enlightned about the topic
-of privacy, but this is far from the truth. Ross Anderson
-wrote the following in his blog\footnote{\url{https://www.lightbluetouchpaper.org/2016/02/11/report-on-the-ip-bill/}} about the approach taken in
-the US to lessons learned from the Snowden leaks and contrasts
-this with the new snooping bill that is considered in the UK
-parliament:
+\section*{Handout 7 (Bitcoins)}
-\begin{quote}\it
-``The comparison with the USA is stark. There, all three
-branches of government realised they'd gone too far after
-Snowden. President Obama set up the NSA review group, and
-implemented most of its recommendations by executive order;
-the judiciary made changes to the procedures of the FISA
-Court; and Congress failed to renew the data retention
-provisions in the Patriot Act (aided by the judiciary). Yet
-here in Britain the response is just to take Henry VIII powers
-to legalise all the illegal things that GCHQ had been up to,
-and hope that the European courts won't strike the law down
-yet again.''
-\end{quote}
+In my opinion Bitcoins are an elaborate Ponzi
+scheme\footnote{\url{http://en.wikipedia.org/wiki/Ponzi_scheme}}---still
+the ideas behind them are really beautiful and not too
+difficult to understand. Since many colourful claims about
+Bitcoins float around in the mainstream and not-so-mainstream
+media, it will be instructive to re-examine such claims from a
+more technically informed vantage point. For example, it is
+often claimed that Bitcoins are anonymous and free from any
+potential government meddling. It turns out that the first
+claim ignores a lot of research in de-anonymising social
+networks, and the second underestimates the persuasive means a
+government has at its disposal.
-\noindent Unfortunately, also big organisations besides
-governments seem to take an unenlightened approach to privacy.
-For example, UCAS, a charity set up to help students with
-applying to universities in the UK, has a commercial unit that
-happily sells your email addresses to anybody who forks out
-enough money for bombarding you with spam. Yes, you can opt
-out very often from such ``schemes'', but in case of UCAS any
-opt-out will limit also legit emails you might actually be
-interested in.\footnote{The main objectionable point, in my
-opinion, is that the \emph{charity} everybody has to use for
-HE applications has actually very honourable goals
-(e.g.~assist applicants in gaining access to universities),
-but the small print (or better the link ``About us'') reveals
-they set up their organisation so that they can also
-shamelessly sell the email addresses they ``harvest''.
-Everything is of course very legal\ldots{}ethical?\ldots{}well
-that is in the eye of the beholder. See:
+There are a lot of articles, blogposts, research papers
+etc.~available about Bitcoins. Below I will follow closely the
+very readable explanations from
+
+\begin{center}
+\url{http://www.michaelnielsen.org/ddi/how-the-bitcoin-protocol-actually-works/} \;\;and\smallskip\\
+\url{http://www.imponderablethings.com/2013/07/how-bitcoin-works-under-hood.html}
+\end{center}
-\url{http://www.ucas.com/about-us/inside-ucas/advertising-opportunities}
-or
-\url{http://www.theguardian.com/uk-news/2014/mar/12/ucas-sells-marketing-access-student-data-advertisers}}
+\noindent The latter also contains a link to a nice youtube
+video about the technical details behind Bitcoins. I will
+also use some of their pictures.
-Another example: Verizon, an ISP who is supposed to provide
-you just with connectivity, has found a ``nice'' side-business
-too: When you have enabled all privacy guards in your browser
-(the few you have at your disposal), Verizon happily adds a
-kind of cookie to your
-HTTP-requests.\footnote{\url{http://webpolicy.org/2014/10/24/how-verizons-advertising-header-works/}}
-As shown in the picture below, this cookie will be sent to
-every web-site you visit. The web-sites then can forward the
-cookie to advertisers who in turn pay Verizon to tell them
-everything they want to know about the person who just made
-this request, that is you.
-
+Let us start with the question who invented Bitcoins? You
+could not make up the answer, but we actually do not know who
+the inventor is. All we know is that the first paper
+
\begin{center}
-\includegraphics[scale=0.16]{../pics/verizon.png}
+\url{https://bitcoin.org/bitcoin.pdf}
\end{center}
-\noindent How disgusting! Even worse, Verizon is not known for
-being the cheapest ISP on the planet (completely the
-contrary), and also not known for providing the fastest
-possible speeds, but rather for being among the few ISPs in
-the US with a quasi-monopolistic ``market distribution''.
+\noindent is signed by Satoshi Nakamoto, which however is
+likely only a pen name. There is a lot of speculation who
+could be the inventor, or inventors, but we simply do not
+know. This part of Bitcoins is definitely anonymous so far.
+The paper above is from the end of 2008; the first Bitcoin
+transaction was made in January 2009. The rules in Bitcoin are
+set up so that there will only ever be 21 Million Bitcoins
+with the maximum reached around the year 2140. Currently there
+are already 11 Million Bitcoins in `existence'. Contrast this
+with traditional fiat currencies where money can be printed
+almost at will. The smallest unit of a Bitcoin is called a
+Satoshi, which is the $10^{-8}$th part of a Bitcoin. Remember
+a Penny is the $10^{-2}$th part of a Pound.
+The two main cryptographic building blocks of Bitcoins are
+cryptographic hashing functions (SHA-256) and public-private
+keys using the elliptic-curve encryption scheme for digital
+signatures. Hashes are used to generate `fingerprints' of data
+that ensure integrity (absence of tampering). Public-private
+keys are used for signatures. For example sending a message,
+say $msg$, together with the encrypted version
+
+\[
+msg, \{msg\}_{K^{priv}}
+\]
+
+\noindent allows everybody with access to the corresponding
+public key $K^{pub}$ to verify that the message came from the
+person who knew the private key. Signatures are used in
+Bitcoins for verifying the addresses where the Bitcoins are
+sent from. Addresses in Bitcoins are essentially the public
+keys. There are $2^{160}$ possible addresses, which is such a
+vast amount that there is not even a check for duplicates, or
+already used addresses. If you start with a random number to
+generate a public-private key pair it is very unlikely that
+you step on somebody else's shoes. Compare this with the
+email-addresses you wanted to register with, say
+Gmail, but which are always already taken.
-Well, we could go on and on\ldots{}and that has not even
-started us yet with all the naughty things NSA \& Friends are
-up to. Why does privacy actually matter? Nobody, I think, has
-a conclusive answer to this question yet. Maybe the following
-four notions help with clarifying the overall picture
-somewhat:
+One major difference between Bitcoins and traditional banking
+is that you do not have a place, or few places, that record the
+balance on your account. Traditional banking involves a
+central ledger which specifies the current balance in each
+account, for example
+
+\begin{center}
+\begin{tabular}{l|r}
+account owner & balance\\\hline
+Alice & \pounds{10.01}\\
+Bob & \pounds{4.99}\\
+Charlie & -\pounds{1.23}\\
+Eve & \pounds{0.00}
+\end{tabular}
+\end{center}
+
+\noindent Bitcoins work differently in that there is no such
+central ledger, but instead a public record of all
+transactions ever made. This means spending money corresponds
+to sending messages of the (oversimplified) form
+
+\begin{equation}
+\{\text{I, Alice, am giving Bob one Bitcoin.}\}_{K^{priv}_{Alice}}
+\end{equation}
+
+\noindent These messages, called transactions, are the only
+data that is ever stored in the Bitcoin system (we will come
+to the precise details later on). The transactions are
+encrypted with Alice's private key so that everybody,
+including Bob, can use Alice's public key $K^{pub}_{Alice}$ to
+verify that this message came really from Alice, or more
+precisely from the person who knows $K^{priv}_{Alice}$.
+
+The problem with such messages in a distributed system is that
+what happens if Bob receives 10, say, of these transactions?
+Did Alice intend to send him 10 Bitcoins, or did the message
+get duplicated by for example an attacker re-playing a sniffed
+message? What is needed is a kind of serial number for such
+transactions. This means transaction messages shoul look more like
+
+\begin{center}
+$\{\text{I, Alice, am giving Bob Bitcoin \#1234567.}\}_{K^{priv}_{Alice}}$
+\end{center}
+
+\noindent There are two difficulties, however, that need to be
+solved with serial numbers. One is who is assigning serial
+numbers to Bitcoins and also how can Bob verify that Alice
+actually owns this Bitcoin to pay him? In a system with a bank
+as trusted third-party, Bob could do the following:
\begin{itemize}
-\item \textbf{Secrecy} is the mechanism used to limit the
- number of principals with access to information (e.g.,
- cryptography or access controls). For example I better
- keep my password secret, otherwise people from the wrong
- side of the law might impersonate me.
-
-\item \textbf{Confidentiality} is the obligation to protect
- the secrets of other people or organisations (secrecy
- for the benefit of an organisation). For example as a
- staff member at King's I have access to data, even
- private data, I am allowed to use in my work but not
- allowed to disclose to anyone else.
-
-\item \textbf{Anonymity} is the ability to leave no evidence of
- an activity (e.g., sharing a secret). This is not equal
- with privacy---anonymity is required in many
- circumstances, for example for whistle-blowers,
- voting, exam marking and so on.
-
-\item \textbf{Privacy} is the ability or right to protect your
- personal secrets (secrecy for the benefit of an
- individual). For example, in a job interview, I might
- not like to disclose that I am pregnant, if I were a
- woman, or that I am a father. Lest they might not hire
- me. Similarly, I might not like to disclose my location
- data, because thieves might break into my house if they
- know I am away at work. Privacy is essentially
- everything which ``shouldn't be anybody's business''.
-
+\item Bob asks the bank whether the Bitcoin with that serial
+ number belongs to Alice and Alice hasn't already spent
+ this Bitcoin.
+\item If yes, then Bob tells the bank he accepts this Bitcoin.
+ The bank updates the records to show that the Bitcoin
+ with that serial number is now in Bob’s possession and
+ no longer belongs to Alice.
\end{itemize}
-\noindent While this might provide us with some rough
-definitions, the problem with privacy is that it is an
-extremely fine line what should stay private and what should
-not. For example, since I am working in academia, I am every
-so often very happy to be a digital exhibitionist: I am very
-happy to disclose all `trivia' related to my work on my
-personal web-page. This is a kind of bragging that is normal
-in academia (at least in the field of CS), even expected if
-you look for a job. I am even happy that Google maintains a
-profile about all my academic papers and their citations.
+\noindent But for this banks would need to be trusted and
+would also be an easy target for any government interference,
+for example. Think of the early days of music sharing where
+the company Napster was the trusted third-party but also the single point of ``failure'' which
+was taken offline by law enforcement. Bitcoins is more like a
+system such as BitTorrent without a single central entity that
+can be taken offline.\footnote{There is some Bitcoin
+infrastructure that is not so immune from being taken offline:
+for example Bitcoin exchanges, HQs of Bitcoin mining pools,
+Bitcoin developers and so on.}
+
+Bitcoins solve the problem of not being able to rely on a bank
+by making everybody the ``bank''. Everybody who cares can have
+the entire transaction history starting with the first
+transaction made in January 2009. This history of transactions
+is called the \emph{blockchain}. Bob, for example, can use his
+copy of the blockchain for determining whether Alice owned the
+Bitcoin he received, and if she did, he transmits the message
+that he owns it now to every other participant on the Bitcoin
+network. An illustration of a three-block segment of the
+blockchain is (simplified) as follows
+
+\begin{equation}
+\includegraphics[scale=0.4]{../pics/bitcoinblockchain0.png}
+\label{segment}
+\end{equation}
+
+\noindent The chain grows with time. Each block contains a
+list of individual transactions, written txn in the picture
+above, and also a reference to the previous block, written
+prev. The data in a block (txn's and prev) is hashed so that
+the reference and transactions in them cannot be tampered
+with. This hash is also the unique serial number of each
+block. Since this previous-block-reference is also part of the
+hash, the whole chain is robust against tampering. I let you
+think why this is the case?\ldots{}But does it actually
+eliminate all possibilities of fraud?
+
+We can check the consistency of the blockchain by checking
+whether all the references and hashes are correctly recorded.
+I have not tried it myself, but it is said that with the
+current amount of data (appr.~12GB) it takes roughly a day to
+check the consistency of the blockchain on a normal computer.
+Fortunately this ``extended'' consistency check usually only
+needs to be done once. Afterwards the blockchain only needs to
+be updated consistently.
+
+Recall I wrote earlier that Bitcoins do not maintain a ledger,
+which lists all the current balances in each account. Instead
+only transactions are recorded. While a current balance of an
+account is not immediately available, it is possible to
+extract from the blockchain a transaction graph that looks
+like the picture shown in Figure~\ref{txngraph}. Each
+rectangle represents a single transaction. Take for example
+the rightmost lower transaction from Charles to Emily. This
+transaction has as receiver the address of Emily and as the
+sender the address of Charles. In this way no Bitcoins can
+appear out of thin air (we will discuss later how Bitcoins are
+actually generated). If Charles did not have a transaction of
+at least the amount he wants to give Emily to his name
+(i.e.~send to an address with his public-private key) then
+there is no way he can make a payment to Emily. Equally, if
+now Emily wants to pay for a coffee, say, with the Bitcoin she
+received from Charles she can essentially only forward the
+message she received. The only slight complication with this
+setup in Bitcoins is that ``incoming'' Bitcoins can be
+combined in a transaction and ``outgoing'' Bitcoins can be
+split. For example in the leftmost upper transactions in
+Figure~\ref{txngraph}, Fred makes a payment to Alice. But this
+payment (or transaction) combines the Bitcoins that were send
+by Jane to Fred and also by Juan to Fred. This allows you to
+``consolidate'' your funds: if it were only possible to split
+transactions, then the amounts would get smaller and smaller.
+
+In Bitcoins you have the ability to both combine incoming
+transactions, but also to split outgoing transactions to
+potentially more than one receiver. The latter is also needed.
+Consider again the rightmost transactions in
+Figure~\ref{txngraph} and suppose Alice is a coffeeshop owner
+selling coffees for 1 Bitcoin. Charles received a transaction
+from Zack over 5 Bitcoins, say. How does Charles pay for the
+coffee? There is no explicit notion of \emph{change} in the
+Bitcoin system. What Charles has to do instead is to make one
+single transaction with 1 Bitcoin to Alice and with 4 Bitcoins
+going back to himself, which then Charles can use to give to
+Emily, for example.
+
+\begin{figure}[t]
+\begin{center}
+\includegraphics[scale=0.4]{../pics/blockchain.png}
+\end{center}
+\caption{Transaction graph that is implicitly recorded in the
+public blockchain.\label{txngraph}}
+\end{figure}
+
+Let us consider another example. Suppose Emily received 4
+Bitcoins from Charles and independently received another
+transaction (not shown in the picture) that sends 6 Bitcoins
+to her. If she now wants to buy a coffee from Alice for 1
+Bitcoin, she has two possibilities: She could just forward the
+transaction from Charles over 4 Bitcoins to Alice split in
+such a way that Alice receives 1 Bitcoin and Emily sends the
+remaining 3 Bitcoins back to herself. In this case she would
+now be in the possession of two unspend Bitcoin transactions,
+one over 3 Bitcoins and the independent one over 6 Bitcoins.
+Or, Emily could combine both transactions (one over 4 Bitcoins
+from Charles and the independent one over 6 Bitcoins) and then
+split this amount with 1 Bitcoin going to Alice and 9 Bitcoins
+going back to herself.
+
+I think this is a good time for you to pause to let this
+concept of transactions to really sink in\ldots{}You should
+come to the conclusion that there is really no need for a central ledger and no
+need for an account balance as familiar from traditional
+banking. The closest what Bitcoin has to offer for the notion
+of a balance in a bank account are the unspend transactions
+that a person (more precisely a public-private key address)
+received. That means transactions that can still be forwarded.
+
+After the pause also consider the fact that whatever
+transaction is recorded in the blockchain will be in the
+``historical record'' for the Bitcoin system. If a transaction
+says 1 Bitcoin goes from address $A$ to address $B$, then this
+is what will be---$B$ has then the possibility to spend the
+corresponding Bitcoins, whether the transaction was done
+fraudulently or not. There is no exception to this rule.
+Interestingly this is also how Bitcoins can get lost: One
+possibility is that you send Bitcoins to an address for which
+nobody has generated a private key, for example because of a
+typo in the address field---bad luck for fat
+fingers\footnote{\url{http://en.wikipedia.org/wiki/Typographical_error}}
+in the Bitcoin system. The reason is that nobody has a private
+key for this erroneous address and consequently cannot forward
+the transaction anymore. Another possibility is that you
+forget your private key and you had messages forwarded to the
+corresponding public key. Also in this case bad luck: you will
+never be able to forward this message again, because you will
+not be able to form a valid message that sends this to
+somebody else (we will see the details of this later). But
+this is also a way how you can get robbed of your Bitcoins. By
+old-fashioned hacking-into-a-computer crime, for example, an
+attacker might get hold of your private key and then quickly
+forwards the Bitcoins that are in your name to an address the
+attacker controls. You will never again have access to these
+Bitcoins, because for the Bitcoin system they are assumed to
+be spent. And remember with Bitcoins you cannot appeal to any
+higher authority. Once the Bitcoins are gone, they are gone.
+This is much different in traditional banking where at least
+you can try to harass the bank to roll back the transaction.
+
+This brings us to back to problem of double spend. Suppose Bob
+is a merchant. How can he make sure that Alice does not cheat
+him? She could for example send a transaction to Bob. But also
+forward the ``same'' transaction to Charlie, or even herself.
+If Alice manages to get the second transaction into the
+blockchain, Bob will be cheated out of his money. The problem
+in such conflicting situations is how should the network
+update their blockchain? You might end up with a picture like
+this
+
+\begin{center}
+\includegraphics[scale=0.4]{../pics/bitcoindisagreement.png}
+\end{center}
-On the other hand I would be very irritated if anybody I do
-not know had a too close look on my private live---it
-shouldn't be anybody's business. The reason is that knowledge
-about my private life can often be used against me. As mentioned
-above, public location data might mean I get robbed. If
-supermarkets build a profile of my shopping habits, they will
-use it to \emph{their} advantage---surely not to \emph{my}
-advantage. Also whatever might be collected about my life will
-always be an incomplete, or even misleading, picture. For
-example I am pretty sure my creditworthiness score was
-temporarily(?) destroyed by not having a regular income in
-this country (before coming to King's I worked in Munich for
-five years). To correct such incomplete or flawed credit
-history data there is, since recently, a law that allows you
-to check what information is held about you for determining
-your creditworthiness. But this concerns only a very small
-part of the data that is held about me/you. Also
-what about cases where data is wrong or outdated (but do we
-need a right-to be forgotten).
+\noindent where Alice convinced some part of the ``world''
+that she is still the owner of the Bitcoin and some other part
+of the ``world'' thinks it's Bob's. How should such a
+disagreement be resolved? This is actually the main hurdle
+where Bitcoin really innovated. The answer is that Bob needs
+to convince ``enough'' people on the network that the
+transaction from Alice to him is legit.
+
+What does, however, ``enough'' mean in a distributed system?
+If Alice sets up a network of a billion, say, puppy identities
+and whenever Bob tries to convince, or validate, that he is
+the rightful owner of the Bitcoin, then the puppy identities
+agree. Bob would then have no reason to not give Alice her
+coffee. But behind his back she has convinced everybody else
+on the network that she is still the rightful owner of the
+Bitcoin. After being outvoted, Bob would be a tad peeved.
+
+The reflex reaction to such a situation would be to make the
+process of validating a transaction as cheap as possible. The
+intention is that Bob will easily get enough peers to agree with him
+that he is the rightful owner. But such a solution has always
+the limitation of Alice setting up an even bigger network of
+puppy identities. The really cool idea of Bitcoin is to go
+into the other direction of making the process of transaction
+validation (artificially) as expensive as possible, but reward
+people for helping with the validation. This is really a novel
+and counterintuitive idea that makes the whole system of
+Bitcoins work so beautifully.
+
+\subsubsection*{Proof-of-Work Puzzles}
+
+In order to make the process of transaction validation
+difficult, Bitcoin uses a kind of puzzle. Solving the puzzles
+is called \emph{Bitcoin mining}, where whoever solves a puzzle
+will be awarded some Bitcoins. At the beginning this was 50
+Bitcoins, but the rules of Bitcoin are set up such that this
+amount halves every 210,000 transactions or so. Currently you
+will be awarded 25 Bitcoins for solving a puzzle. Because the
+amount will halve again and then later again and again, around
+the year 2140 it will go below the level of 1 Satoshi. In that
+event no new Bitcoins will ever be created again and the
+amount of Bitcoins stays fixed. There will be still an
+incentive to help with validating transactions, because there
+is the possibility in Bitcoins to offer a transaction fee to
+whoever solves a puzzle. At the moment this fee is usually set
+to 0, since the incentive for miners is the 25 Bitcoins that
+are currently awarded for solving puzzles.
+
+What do the puzzles that miners have to solve look like? The
+puzzles can be illustrated roughly as follows: Given a string,
+say \code{"Hello, world!"}, what is the salt so that the hash
+starts with a long run of zeros? Let us look at a concrete
+example. Recall that Bitcoins use the hash-function SHA-256.
+Suppose we call this hash function \code{h}, then we could try
+the salt \code{0} as follows:
+
+\begin{quote}
+\code{h("Hello, world!0") =}\\
+\mbox{}\quad\footnotesize\pcode{1312af178c253f84028d480a6adc1e25e81caa44c749ec81976192e2ec934c64}
+\end{quote}
+
+\noindent OK this does not have any zeros at all. We could
+next try the salt \code{1}:
+
+\begin{quote}
+\code{h("Hello, world!1") =}\\
+\mbox{}\quad\footnotesize\pcode{e9afc424b79e4f6ab42d99c81156d3a17228d6e1eef4139be78e948a9332a7d8}
+\end{quote}
+
+\noindent Again this hash value does not contain any leading
+zeros. We could now try out every salt until we reach
+
+\begin{quote}
+\code{h("Hello, world!4250") =}\\
+\mbox{}\quad\footnotesize\pcode{0000c3af42fc31103f1fdc0151fa747ff87349a4714df7cc52ea464e12dcd4e9}
+\end{quote}
+
+\noindent where we have four leading zeros. If four zeros are
+enough, then the puzzle would be solved with this salt. The
+point is that we can very quickly check whether a salt solves
+a puzzle, but it is hard to find one. Latest research suggest
+it is an NP-problem. If we want the output hash value to begin
+with 10 zeroes, say, then we will, on average, need to try
+$16^{10} \approx 10^{12}$ different salts before we find a
+suitable one.
+
+In Bitcoins the puzzles are not solved according to how many
+leading zeros a hash-value has, but rather whether it is below
+a \emph{target}. The hardness of the puzzle can actually be
+controlled by changing the target according to the available
+computational power available. I think the adjustment of the
+hardness of the problems is done every 2060 blocks
+(appr.~every two weeks). The aim of the adjustment is that on
+average the Bitcoin network will most likely solve a puzzle
+within 10 Minutes.
+
+\begin{center}
+\includegraphics[scale=0.37]{../pics/blockchainsolving.png}
+\end{center}
+
+\noindent It could be solved quicker, but equally it could
+take longer, but on average after 10 Minutes somebody on the
+network will have found a solution.
+
+Remember that the puzzles are a kind of proof-of-work that
+make the validation of transactions artificially expensive.
+Consider the following picture with a blockchain and some
+unconfirmed transactions.
+
+\begin{equation}
+\includegraphics[scale=0.38]{../pics/bitcoin_unconfirmed.png}
+\label{unconfirmed}
+\end{equation}
+
+\noindent The puzzle is stated as follows: There are some
+unconfirmed transactions. Choosing some of them, the miner
+(i.e.~the person/computer that tries to solve a puzzle) will
+form a putative block to be added to the blockchain. This
+putative block will contain the transactions and the reference
+to the previous block. The serial number of such a block is
+simply the hash of all the data. The puzzle can then be stated
+as the ``string'' corresponding to the block and which salt is
+needed in order to have the hashed value being below the
+target. Other miners will choose different transactions and
+therefore work on a slightly different putative block and
+puzzle.
+
+The intention of the proof-of-work puzzle is that the
+blockchain is at every given moment linearly ordered, see the
+picture shown in \eqref{unconfirmed}. If we don’t have such a
+linear ordering at any given moment then it may not be clear
+who owns which Bitcoins. Assume a miner David is lucky and
+finds a suitable salt to confirm some transactions. Should he
+celebrate? Not yet. Typically the blockchain will look as
+follows
+
+\begin{center}
+\includegraphics[scale=0.65]{../pics/block_chain1.png}
+\end{center}
+
+\noindent But every so often there will be a fork
+
+\begin{center}
+\includegraphics[scale=0.65]{../pics/block_chain_fork.png}
+\end{center}
+
+\noindent What should be done in this case? Well, the tie is
+broken if another block is solved, like so:
+
+\begin{center}
+\includegraphics[scale=0.4]{../pics/bitcoin_blockchain_branches.png}
+\end{center}
+
+\noindent The rule in Bitcoins is: If a fork occurs, people on
+the network keep track of all forks (they can see). But at any
+given time, miners only work to extend whichever fork is
+longest in their copy of the block chain. Why should miners
+work on the longest fork? Well their incentive is to mine
+Bitcoins. If somebody else already solved a puzzle, then it
+makes more sense to work on a new puzzle and obtain the
+Bitcoins for solving that puzzle, rather than waste efforts on
+a fork that is shorter and therefore less likely to be
+``accepted''. Note that whoever solved a puzzle on the
+``loosing'' fork will actually not get any Bitcoins as reward.
+Tough luck.
-To see how private matter can lead really to the wrong
-conclusions, take the example of Stephen Hawking: When he was
-diagnosed with his disease, he was given a life expectancy of
-two years. If employers would know about such problems, would
-they have employed Hawking? Now, he is enjoying his 70+
-birthday. Clearly personal medical data needs to stay private.
-
-To cut a long story short, I let you ponder about the two
-statements which are often voiced in discussions about privacy:
+\subsubsection*{Alice against the Rest of the World}
-\begin{itemize}
-\item \textit{``You have zero privacy anyway. Get over
-it.''}\\
-\mbox{}\hfill{}{\small{}(by Scott Mcnealy, former CEO of Sun)}
+Let us see how the blockchain and the proof-of-work puzzles
+avoid the problem of double spend. If Alice wants to cheat
+Bob, she would need to pull off the following ploy:
-\item \textit{``If you have nothing to hide, you have nothing
-to fear.''}
-\end{itemize}
-
-\noindent If you like to watch a movie which has this topic as
-its main focus I recommend \emph{Gattaca} from
-1997.\footnote{\url{http://www.imdb.com/title/tt0119177/}} If
-you want to read up on this topic, I can recommend the
-following article that appeared in 2011 in the Chronicle of
-Higher Education:
+\begin{center}
+\includegraphics[scale=0.4]{../pics/bitcoin_blockchain_double_spend.png}
+\end{center}
-\begin{center}
-\url{http://chronicle.com/article/Why-Privacy-Matters-Even-if/127461/}
-\end{center}
+\noindent Alice makes a transaction to Bob for paying, for
+example, for an online order. This transaction is confirmed,
+or validated, in block 2. Bob ships the goods around block 4.
+In this moment, Alice needs to get into action and try to
+validate the fraudulent transaction to herself instead. At
+this moment she is in a race against all the computing power
+of the ``rest of the world''. Because the incentive of the
+rest of the world is to work on the longest chain, that is the
+one with the transaction from Alice to Bob:
-\noindent Funnily, or maybe not so funnily, the author of this
-article carefully tries to construct an argument that does not
-only attack the nothing-to-hide statement in cases where
-governments \& co collect people's deepest secrets, or
-pictures of people's naked bodies, but an argument that
-applies also in cases where governments ``only'' collect data
-relevant to, say, preventing terrorism. The fun is of course
-that in 2011 we could just not imagine that respected
-governments would do such infantile things as intercepting
-people's nude photos. Well, since Snowden we know some people
-at the NSA did exactly that and then shared such photos among
-colleagues as ``fringe benefit''.
+\begin{center}
+\includegraphics[scale=0.4]{../pics/bitcoin_doublespend_blockchain_race.png}
+\end{center}
-
-\subsubsection*{Re-Identification Attacks}
+\noindent As shown in the picture she has to solve the puzzles
+2a to 5a one after the other, because the hash of a block is
+determined via the reference by all the data in the previous
+block. She might be very lucky to solve one puzzle for a block
+before the rest of the world, but to be lucky many times is
+very unlikely. This principle of having to race against the
+rest of the world avoids the ploy of double spend.
-Apart from philosophical musings, there are fortunately also
-some real technical problems with privacy. The problem I want
-to focus on in this handout is how to safely disclose datasets
-containing potentially very private data, say health records.
-What can go wrong with such disclosures can be illustrated
-with four well-known examples:
+In order to raise the bar for Alice even further, merchants
+accepting Bitcoins use the following rule of thumb: A
+transaction is ``confirmed'' if
\begin{itemize}
-\item In 2006, a then young company called Netflix offered a 1
- Mio \$ prize to anybody who could improve their movie
- rating algorithm. For this they disclosed a dataset
- containing 10\% of all Netflix users at the time
- (appr.~500K). They removed names, but included numerical
- ratings of movies as well as times when ratings were
- uploaded. Though some information was perturbed (i.e.,
- slightly modified).
-
- Two researchers had a closer look at this anonymised
- data and compared it with public data available from the
- International Movie Database (IMDb). They found that
- 98\% of the entries could be re-identified in the
- Netflix dataset: either by their ratings or by the dates
- the ratings were uploaded. The result was a class-action
- suit against Netflix, which was only recently resolved
- involving a lot of money.
+\item[(1)] it is part of a block in the longest fork, and
+\item[(2)] at least 5 blocks follow it in the longest fork. In
+ this case we say that the transaction has 6
+ confirmations.
+\end{itemize}
+
+\noindent A simple calculation shows that this amount of
+confirmations can take up to 1 hour and more. While this seems
+excessively long, from the merchant's point of view it is not
+that long at all. For this recall that ordinary creditcards
+can have their transactions been rolled-back for 6 months or
+so. The point however is that the odds for Alice being able to
+cheat are very low, unless she can muster more than 50\% of
+the world Bitcoin mining capacity. In this case she could
+out-race the rest of the world. The point is however that
+amassing such an amount of computing power is practically
+impossible for a single person or even a moderately large
+group.
-\item In the 1990ies, medical datasets were often made public
- for research purposes. This was done in anonymised form
- with names removed, but birth dates, gender and ZIP-code
- were retained. In one case where such data about
- hospital visits of state employees in Massachusetts was
- made public, the then governor assured the public that
- the released dataset protected patient privacy by
- deleting identifiers.
-
- A graduate student could not resist cross-referencing
- public voter data with the released data that still
- included birth dates, gender and ZIP-code. The result
- was that she could send the governor his own hospital
- record. It turns out that birth dates, gender and
- ZIP-code uniquely identify 87\% of people in the US.
- This work resulted in a number of laws prescribing which
- private data cannot be released in such datasets.
-
-\item In 2006, AOL published 20 million Web search queries
- collected from 650,000 users (names had been deleted).
- This was again done for research purposes. However,
- within days an old lady, Thelma Arnold, from Lilburn,
- Georgia, (11,596 inhabitants) was identified as user
- No.~4417749 in this dataset. It turned out that search
- engine queries are deep windows into people's private
- lives.
-
-\item Genome-Wide Association Studies (GWAS) was a public
- database of gene-frequency studies linked to diseases.
- It would essentially record that people who have a
- disease, say diabetes, have also certain genes. In order
- to maintain privacy, the dataset would only include
- aggregate information. In case of DNA data this
- aggregation was achieved by mixing the DNA of many
- individuals (having a disease) into a single solution.
- Then this mixture was sequenced and included in the
- dataset. The idea was that the aggregate information
- would still be helpful to researchers, but would protect
- the DNA data of individuals.
-
- In 2007 a forensic computer scientist showed that
- individuals can still be identified. For this he used
- the DNA data from a comparison group (people from the
- general public) and ``subtracted'' this data from the
- published data. He was left with data that included all
- ``special'' DNA-markers of the individuals present in
- the original mixture. He essentially deleted the
- ``background noise'' in the published data. The problem
- with DNA data is that it is of such a high resolution
- that even if the mixture contained maybe 100
- individuals, you can with current technology detect
- whether an individual was included in the mixture or
- not.
-
- This result changed completely how DNA data is nowadays
- published for research purposes. After the success of
- the human-genome project with a very open culture of
- exchanging data, it became much more difficult to
- anonymise data so that patient's privacy is preserved.
- The public GWAS database was taken offline in 2008.
-
+Connected with the 6-confirmation rule is an interesting
+phenomenon. On average, it would take several years for a
+typical computer to solve a proof-of-work puzzle, so an
+individual’s chance of ever solving one before the rest of the
+world, which typically takes only 10 minutes, is negligibly
+low. Therefore many people join groups called \emph{mining
+pools} that collectively work to solve blocks, and distribute
+rewards based on work contributed. These mining pools act
+somewhat like lottery pools among co-workers, except that some
+of these pools are quite large, and comprise more than 20\% of
+all the computers in the network. It is said that BTCC, a
+large mining pool, has limited its number of members in order
+to not solve more than 6 blocks in a row. Otherwise this would
+undermine the trust in Bitcoins, which is also not in the
+interest of BTCC, I guess. Some statistics on mining pools can
+be seen at
+
+\begin{center}
+\url{https://blockchain.info/pools}
+\end{center}
+
+\noindent Here is an interesting problem: You are part of a lottery
+pool, if you chip in some of the money to buy a lottery ticket. In
+this setting it is clear when you are in or outside of the pool. But
+how do you make sure people work hard in a mining pool in order to
+justify a fraction of any reward? If evil me had its way, I would just
+claim I do work and then sit back and relax. Or even if I do some work
+for a mining pool and I happen to find a correct salt, I would keep it
+secret and submit it to the bitcoin network on the ``side''. Actually,
+the idea of mining pools has opened up a full can of interesting
+problems.
+
+
+
+\subsubsection*{Bitcoins for Real}
+
+Let us now turn to the nitty gritty details. As a participant
+in the Bitcoin network you need to generate and store a
+public-private key pair. The public key you need to advertise
+in order to receive payments (transactions). The private key
+needs to be securely stored. For this there seem to be three
+possibilities
+
+\begin{itemize}
+\item an electronic wallet on your computer
+\item a cloud-based storage (offered by some Bitcoin services)
+\item paper-based
\end{itemize}
-\noindent There are many lessons that can be learned from
-these examples. One is that when making datasets public in
-anonymised form, you want to achieve \emph{forward privacy}.
-This means, no matter what other data that is also available
-or will be released later, the data in the original dataset
-does not compromise an individual's privacy. This principle
-was violated by the availability of ``outside data'' in the
-Netflix and governor of Massachusetts cases. The additional
-data permitted a re-identification of individuals in the
-dataset. In case of GWAS a new technique of re-identification
-compromised the privacy of people in the dataset. The case of
-the AOL dataset shows clearly how incomplete such data can be:
-Although the queries uniquely identified the older lady, she
-also looked up diseases that her friends had, which had
-nothing to do with her. Any rational analysis of her query
-data must therefore have concluded, the lady is on her
-death bed, while she was actually very much alive and kicking.
+\noindent The first two options of course offer convenience
+for making and receiving transactions. But given the nature of
+the private keys and how much security relies on them (recall
+if somebody gets hold of it, your Bitcoins are quickly lost
+forever) I would opt for the third option for anything except
+for trivial amounts of Bitcoins. As we have seen earlier in
+the course, securing a computer system that it can withstand a
+targeted breakin is still very much an unsolved problem.
+
+An interesting fact with Bitcoin keys is that there is no
+check for duplicate addresses. This means when generating a
+public-private key, you should really start with a carefully
+chosen random number such that there is really no chance to
+step on somebody's feet in the $2^{160}$ space of
+possibilities. Again if you share an address with somebody
+else, he or she has access to all your unspend transactions.
+The absence of such a check is easily explained: How would one
+do this in a distributed system? The answer you can't. It is
+possible to do some sanity check of addresses that are already
+used in the blockchain, but this is not a fail-proof method.
+One really has to trust on the enormity of the $2^{160}$
+space for addresses.
+
+Let us now look at the concrete data that is stored in an transaction
+message:
+
+\lstinputlisting[language=Scala]{../slides/msg}
+
+\noindent The hash in Line 1 is the hash of all the data that
+follows. It is a kind of serial number for the transaction.
+Line 2 contains a version number in case there are some
+incompatible changes to be made. Lines 3 and 4 specify how
+many incoming transactions are combined and how many outgoing
+transactions there are. In our example there are one for each.
+Line 5 specifies a lock time for when the transaction is
+supposed to become active---this is usually set to 0 to become
+active immediately. Line 6 specifies the size of the message;
+it has nothing to do with the Bitcoins that are transferred.
+Lines 7 to 11 specify where the Bitcoins in the transaction
+are coming from. The has in line 9 specifies the incoming
+transaction and the \pcode{n} in Line 10 specifies which
+output of the transaction is referred to. The signature in
+line 11 specifies the address (public key $K^{pub}$) from
+where the Bitcoins are taken and the digital signature of the
+address, that is $\{K^{pub}\}_{K^{priv}}$. Lines 12 to 15
+specify the value of the first outgoing transaction. In this
+case 0.319 Bitcoins. The hash in Line 14 specifies the address
+to where the Bitcoins are transferred.
+
+As can be seen there is no need to issue serial numbers for
+transactions, the hash of the transaction data can do this
+job. The hash will contain the sender addresses and
+hash-references to the incoming transactions, as well as the
+public key of the incoming transaction. This uniquely
+identifies a transaction and the hash is the unique
+fingerprint of it. The in-field also contains the address to
+which a earlier transaction is made. The digital signature
+ensures everybody can check that the person who makes this
+transaction is in the possession of the private key. Otherwise
+the signature would not match up with the public-key address.
+
+When mining the blockchain it only needs to be ensured that
+the transactions are consistent (all hashes and signatures
+match up). Then we need to generate the correct previous-block
+link and solve the resulting puzzle. Once the block is
+accepted, everybody can check the integrity of the whole
+blockchain.
+
+A word of warning: The point of a lottery is that some people
+win. But equally, that most people lose. Mining Bitcoins has
+pretty much the same point. According to the article below, a
+very large machine (very, very large in terms of June 2014)
+could potentially mine \$40 worth of Bitcoins a day, but would
+require magnitudes more of electricity costs to do so.
-In 2016, Yahoo released the so far largest machine learning
-dataset to the research community. It includes approximately
-13.5 TByte of data representing around 100 Billion events from
-anonymized user-news items, collected by recording
-interactions of about 20M users from February 2015 to May
-2015. Yahoo's gracious goal is to promote independent research
-in the fields of large-scale machine learning and recommender
-systems. It remains to be seen whether this data will really
-only be used for that purpose.
+\begin{center}
+\url{http://bitcoinmagazine.com/13774/government-bans-professor-mining-bitcoin-supercomputer/}
+\end{center}
+
+\noindent Bitcoin mining nowadays is only competitive, or
+profitable, if you get the energy for free, or use special
+purpose computing devices.
+
+This about ``free'' energy can actually hurt you very badly in
+unexpected ways. You probably have heard about, or even used,
+Amazon's Elastic Compute Cloud (EC2). Essentially, Amazon is
+selling computing power that you can use to run your web site,
+for example. It is \emph{elastic} in the sense that if you
+have a lot of visitors, you pay a lot, if you have only a few,
+then it is cheap. In order to bill you, you need to set
+up an account with Amazon and receive some secret keys in
+order to authenticate you. The clever (but also dangerous) bit
+is that you upload the code of your web site to GitHub and
+Amazon will pull it from there. You can probably already guess
+where this is going: in order to learn about Amazon's API, it
+gives out some limited computing power for free. Somebody used
+this offer in order to teach himself Ruby on Rails with a
+mildly practical website. Unfortunately, he uploaded also his
+secret keys to GitHub (this is really an easy mistake). Now,
+nasty people crawl GitHub for the purpose of stealing such
+secret keys. What can they do with this? Well, they quickly
+max out the limit of computing power with Amazon and mine
+Bitcoins (under somebody else's account). Fortunately for this
+guy, Amazon was aware of this scam and in a goodwill gesture
+refunded him the money the nasty guys incurred over
+night with their Bitcoin mining. If you want to read the
+complete story, google for ``My \$2375 Amazon EC2 Mistake''.
+
+\subsubsection*{Multi-Signature Transactions}
+
+To be explained.
-\subsubsection*{Differential Privacy}
+\subsubsection*{Anonymity with Bitcoins}
+
+One question one often hears is how anonymous is it actually
+to pay with Bitcoins? Paying with paper money used to be a
+quite anonymous act (unlike paying with credit cards, for
+example). But this has changed nowadays: You cannot come to a
+bank anymore with a suitcase full of money and try to open a
+bank account. Strict money laundering and taxation laws mean
+that not even Swiss banks are prepared to take such money and
+open a bank account. That is why Bitcoins are touted as
+filling this niche again of anonymous payments.
+
+While Bitcoins are intended to be anonymous, the reality is
+slightly different. I fully agree with the statement by
+Nielsen from the blog article I referenced at the beginning:
+
+\begin{quote}\it{}``Many people claim that Bitcoin can be used
+anonymously. This claim has led to the formation of
+marketplaces such as Silk Road (and various successors), which
+specialize in illegal goods. However, the claim that Bitcoin
+is anonymous is a myth. The block chain is public, meaning
+that it’s possible for anyone to see every Bitcoin transaction
+ever. Although Bitcoin addresses aren't immediately associated
+to real-world identities, computer scientists have done a
+great deal of work figuring out how to de-anonymise
+`anonymous' social networks. The block chain is a marvellous
+target for these techniques. I will be extremely surprised if
+the great majority of Bitcoin users are not identified with
+relatively high confidence and ease in the near future.''
+\end{quote}
+
+\noindent The only thing I can add to this is that with the Bitcoin
+blockchain we will in the future have even more pleasure hearing
+confessions from reputable or not-so-reputable people, like the
+infamous ``I did not inhale'' from an US
+president.\footnote{\url{www.youtube.com/watch?v=Bktd_Pi4YJw}} The
+whole point of the blockchain is that it public and will always be.
-Differential privacy is one of the few methods that tries to
-achieve forward privacy. The basic idea is to add appropriate
-noise, or errors, to any query of the dataset. The intention
-is to make the result of a query insensitive to individual
-entries in the database. That means the results are
-approximately the same no matter if a particular individual is
-in the dataset or not. The hope is that the added error does
-not eliminate the ``signal'' one is looking for in the
-dataset.
+There are some precautions one can take for boosting anonymity, for
+example to use a new public-private key pair for every new
+transaction, and to access Bitcoin only through the Tor network. But
+the transactions in Bitcoins are designed such that they allow one to
+combine incoming transactions. In such cases we know they must have
+been made by the single person who knew the corresponding private
+keys. So using different public-private keys for each transaction
+might not actually make the de-anonymisation task much harder. And the
+point about de-ano\-nymising `anonymous' social networks is that the
+information is embedded into the structure of the transition
+graph. And this cannot be erased with Bitcoins.
+
+One paper that has fun with spotting transactions made to Silk Road (2.0)
+and also to Wikileaks is
+
+\begin{center}
+\url{http://people.csail.mit.edu/spillai/data/papers/bitcoin-transaction-graph-analysis.pdf}
+\end{center}
+
+\noindent
+A paper that gathers some statistical data about the blockchain is
+
+\begin{center}
+\url{https://eprint.iacr.org/2012/584.pdf}
+\end{center}
+
+\subsubsection*{Government Meddling}
+
+Finally, what are the options for a typical Western government to
+meddle with Bitcoins? This is of course one feature the proponents of
+Bitcoins also tout: namely that there aren't any options. In my
+opinion this is far too naive and far from the truth. Let us assume
+some law enforcement agencies would not have been able to uncover the
+baddies from Silk Road 1.0 and 2.0 (they have done so by uncovering
+the Tor network, which is an incredible feat on its own). Would the
+government in question have stopped? I do not think so. The next
+target would have been Bitcoin. If I were the government, this is
+what I would consider:
-%\begin{center}
-%User\;\;\;\;
-%\begin{tabular}{c}
-%tell me $f(x)$ $\Rightarrow$\\
-%$\Leftarrow$ $f(x) + \text{noise}$
-%\end{tabular}
-%\;\;\;\;\begin{tabular}{@{}c}
-%Database\\
-%$x_1, \ldots, x_n$
-%\end{tabular}
-%\end{center}
-%
-%\begin{center}
-%\begin{tabular}{l|l}
-%Staff & Salary\\\hline
-%$PM$ & \pounds{107}\\
-%$PF$ & \pounds{102}\\
-%$LM_1$ & \pounds{101}\\
-%$LF_2$ & \pounds{97}\\
-%$LM_3$ & \pounds{100}\\
-%$LM_4$ & \pounds{99}\\
-%$LF_5$ & \pounds{98}
-%\end{tabular}
-%\end{center}
-%
-%
-%\begin{center}
-%\begin{tikzpicture}
-%\begin{axis}[symbolic y coords={salary},
-% ytick=data,
-% height=3cm]
%\addplot+[jump mark mid] coordinates
%{(0,salary) (0.1,salary)
% (0.4,salary) (0.5,salary)
% (0.8,salary) (0.9,salary)};
%\end{axis}
%\end{tikzpicture}
-%\end{center}
-%
-%\begin{tikzpicture}[outline/.style={draw=#1,fill=#1!20}]
% \node [outline=red] {red box};
% \node [outline=blue] at (0,-1) {blue box};
%\end{tikzpicture}
+\begin{itemize}
+\item The government could compel ``mayor players'' to blacklist
+ Bitcoins (for example at Bitcoin exchanges, which are usually
+ located somewhere in the vicinity of the government's reach). This
+ would impinge on what is called \emph{fungibility} of Bitcoins and
+ make them much less attractive to baddies. Suddenly their
+ ``hard-earned'' Bitcoin money cannot be spent anymore. The attraction
+ of this option is that this blacklisting can be easily done
+ ``whole-sale'' and therefore be really be an attractive target for
+ governments \& Co.
+\item The government could attempt to coerce the developer
+ community of the Bitcoin tools. While this might be a
+ bit harder, we know certain governments are ready to
+ take such actions (we have seen this with Lavabit, just
+ that the developers there refused to play ball and shut
+ down their complete operation).
+\item The government could also put pressure on mining pools
+ in order to blacklist transactions from baddies. Or be a
+ big miner itself. Given the gigantic facilities that
+ are built for institutions like the NSA (pictures from
+ the Utah dessert)
+
+ \begin{center}
+ \includegraphics[scale=0.04]{../pics/nsautah1.jpg}
+ \hspace{3mm}
+ \includegraphics[scale=0.031]{../pics/nsautah2.jpg}
+ \end{center}
+
+ this would not be such a high bar to jump over. Remember it
+ ``only'' takes to be temporarily in control of 50\%-plus of the
+ mining capacity in order to undermine the trust in the
+ system. Given sophisticated stories like Stuxnet (where we still
+ do not know the precise details) maybe even such large
+ facilities are not really needed. What happens, for example, if
+ a government starts DoS attacks on existing miners? They have
+ complete control (unfortunately) of all mayor connectivity
+ providers, i.e.~ISPs.
+
+ There are estimates that the Bitcoin mining capacity
+ outperforms the top 500 supercomputers in the world,
+ combined(!):
+
+ \begin{center}\small
+ \url{http://www.forbes.com/sites/reuvencohen/2013/11/28/global-bitcoin-computing-power-now-256-times-faster-than-top-500-supercomputers-combined/}
+ \end{center}
+
+ But my gut feeling is that these are too simplistic
+ calculations. In security (and things like Bitcoins) the
+ world is never just black and white. The point is once
+ the trust is undermined, the Bitcoin system would need
+ to be evolved to Bitcoins 2.0. But who says that Bitcoin
+ 2.0 will honour the Bitcoins from Version 1.0?
+ \end{itemize}
-\ldots
+\noindent A government would potentially not really
+need to follow up with such threads. Just the rumour that it
+would, could be enough to get the Bitcoin-house-of-cards to
+tumble. Some governments have already such an ``impressive''
+trackrecord in this area, such a thread would be entirely
+credible. Because of all this, I would not have too much hope
+that Bitcoins are free from interference by governments \& Co when
+it will stand in their way, despite what everybody else is
+saying. To sum up, the technical details behind Bitcoins are
+simply cool. But still the entire Bitcoin ecosystem is in my
+humble opinion rather fragile.
+
+
+\subsubsection*{Isn't there anything good with Bitcoins?}
+
+As you can see, so far my argument was that yes the Bitcoin system is
+based on a lot of very cool technical ideas, but otherwise it is a big
+scam. You might wonder if there is not something good (in terms of
+valuable for civilisation) in the bitcoin system? I think there is
+actually: diamonds are quite valuable and because of this can be
+used as a form of `money'---just remember the song with the line
+`diamonds are forever'.
+
+The problem with diamonds is that in some places where they are found,
+they also fund some stupid wars. You like to set up a usable system
+whereby you can check whether a diamond comes from a reputable source
+(not funding any wars) or from a dodgy source. For this you have to
+know that `clearing houses' for diamonds can engrave with lasers
+unique numbers inside the diamonds. These engravings are invisible to
+the naked eye and as far as I remember these numbers cannot be removed,
+except by destroying the diamond. Even if it can be removes, diamonds
+without the number cannot (hopefully) be sold.
+How do bitcoins come into the picture? The idea is called
+\emph{coloured coins}, where you attach some additional information to
+some `coins'. In the diamond example the bitcoin transactions are
+supposed to act as a certificate where diamonds are from (reputable
+sources or not). For this you have to know that you can attach a very
+short custom-made message with each bitcoin transaction. So you would
+record the diamond number inside the message.
+
+Now, you would set the system up so that a trusted entity (which
+exists in the diamond world) buys with their public key bitcoins (or
+smaller amounts). These trusted entities are essentially the places
+that also cut the raw diamonds. The idea is whenever you buy a
+diamond, you like to have also the corresponding bitcoin
+transaction. If you want to sell the diamond, you make a transaction
+to the new owner. The new owner will ask for this message, because
+otherwise he/she cannot sell it later on.
+
+The advantage is that for each diamond you can trace back that the
+transaction must have originated from the trusted entity. If yes, your
+diamond will be sellable. If you do not have the message, the diamond
+comes from a dodgy source and will (hopefully) not be sellable later
+on. In this way you skew the incentives such that only legitimate
+diamond are of value. The bitcoin system just helps with being able to
+check whether the message originates from the trusted entity or
+not....you do not have to consult anybody else and pay money for this
+consultation. Or in any way reveal your identity by such a consultation
+(the police might just keep a particularly close an eye on who contacts
+such a clearing house).
+
+Since we hopefully all agree that funding stupid wars is bad, any
+system that can starve funds for such wars must be good. Piggy-bagging
+on the trust established by the bitcoin system on the public block chain
+makes such a system realisable.
\subsubsection*{Further Reading}
-Two cool articles about how somebody obtained via the Freedom
-of Information Law the taxicab dataset of New York and someone
-else showed how easy it is to mine for private information:
+Finally, finally, the article
\begin{center}\small
-\begin{tabular}{p{0.78\textwidth}}
-\url{http://chriswhong.com/open-data/foil_nyc_taxi/}\smallskip\\
-\url{http://research.neustar.biz/2014/09/15/riding-with-the-stars-passenger-privacy-in-the-nyc-taxicab-dataset}
-\end{tabular}
+\url{http://www.extremetech.com/extreme/155636-the-bitcoin-network-outperforms-the-top-500-supercomputers-combined}
\end{center}
-\noindent
-A readable article about how supermarkets mine your shopping
-habits (especially how they prey on new exhausted parents
-;o) appeared in 2012 in the New York Times:
+\noindent makes an interesting point: If people are willing to
+solve meaningless puzzles for hard, cold cash and with this
+achieve rather impressive results, what could we achieve if
+the UN, say, would find the money and incentivise people to,
+for example, solve protein folding
+puzzles?\footnote{\url{http://en.wikipedia.org/wiki/Protein_folding}}
+For this there are projects like
+Folding@home.\footnote{\url{http://folding.stanford.edu}}
+This might help with curing diseases such as Alzheimer or
+diabetes. The same point is made in the article
+
+\begin{center}\small
+\url{http://gizmodo.com/the-worlds-most-powerful-computer-network-is-being-was-504503726}
+\end{center}
+
+A definitely interesting and worthy use of Bitcoins has been explored
+in the thesis
\begin{center}
-\url{http://www.nytimes.com/2012/02/19/magazine/shopping-habits.html}
-\end{center}
-
-\noindent An article that analyses privacy and shopping habits
-from a more economic point of view is available from:
-
-\begin{center}
-\url{http://www.dtc.umn.edu/~odlyzko/doc/privacy.economics.pdf}
+\url{http://enetium.com/resources/Thesis.pdf}
\end{center}
-\noindent An attempt to untangle the web of current technology
-for spying on consumers is published in:
+\noindent where the author proposes ways of publishing information
+that is censor-resistant as part of the blockchain. The idea is that
+if a government wants to use Bitcoins, it would also have to put up
+with plain-text data that can be included in a transaction.
-\begin{center}
-\url{http://cyberlaw.stanford.edu/files/publication/files/trackingsurvey12.pdf}
-\end{center}
+Ken Shirrif in his blog at
-\noindent An article that sheds light on the paradox that
-people usually worry about privacy invasions of little
-significance, and overlook the privacy invasion that might
-cause significant damage:
+\begin{center}\small
+\url{http://www.righto.com/2014/02/bitcoin-mining-hard-way-algorithms.html}
+\end{center}
-\begin{center}
-\url{http://www.heinz.cmu.edu/~acquisti/papers/Acquisti-Grossklags-Chapter-Etrics.pdf}
-\end{center}
-
-
-Interesting ideas
+\noindent writes that every day the electricity consumption of mining
+for bitcoins is roughly 15 Mega Watts---the energy consumption of a country
+like Cambodia. He writes:
-\begin{center}
-\url{https://adnauseam.io}
-\end{center}
-
-\noindent
-And a paper that predicts ad-blockers will in the end win over anti-ad-blocking:
-
-\begin{center}
-\url{http://randomwalker.info/publications/ad-blocking-framework-techniques.pdf}
-\end{center}
-
+\begin{quote}
+ \it{}``The difficulty of mining a block is astounding. At the
+ current difficulty, the chance of a hash succeeding is a bit less
+ than one in $10^{19}$. Finding a successful hash is harder than
+ finding a particular grain of sand from all the grains of sand on
+ Earth. To find a hash every ten minutes, the Bitcoin hash rate needs
+ to be insanely large. Currently, the miners on the Bitcoin network
+ are doing about 25 million gigahashes per second. That is, every
+ second about 25,000,000,000,000,000 blocks gets hashed. I estimate
+ (very roughly) that the total hardware used for Bitcoin mining cost
+ tens of millions of dollars and uses as much power as the country of
+ Cambodia.''
+\end{quote}
\end{document}
-http://randomwalker.info/teaching/fall-2012-privacy-technologies/?
-http://chronicle.com/article/Why-Privacy-Matters-Even-if/127461/
-http://repository.cmu.edu/cgi/viewcontent.cgi?article=1077&context=hcii
-https://josephhall.org/papers/NYU-MCC-1303-S2012_privacy_syllabus.pdf
-http://www.jetlaw.org/wp-content/uploads/2014/06/Bambauer_Final.pdf
-http://www.cs.cmu.edu/~yuxiangw/docs/Differential%20Privacy.pdf
-https://www.youtube.com/watch?v=Gx13lgEudtU
-https://www.cs.purdue.edu/homes/ctask/pdfs/CERIAS_Presentation.pdf
-http://www.futureofprivacy.org/wp-content/uploads/Differential-Privacy-as-a-Response-to-the-Reidentification-Threat-Klinefelter-and-Chin.pdf
-http://www.cis.upenn.edu/~aaroth/courses/slides/Overview.pdf
-http://www.cl.cam.ac.uk/~sjm217/papers/tor14design.pdf
+bit coin
+
+A fistful of bitcoins
+http://cseweb.ucsd.edu/~smeiklejohn/files/imc13.pdf
+http://cseweb.ucsd.edu/~smeiklejohn/files/imc13.pdf
+
+Ross Anderson & Co (no dispute resolution; co-ercion)
+http://www.cl.cam.ac.uk/~sjm217/papers/fc14evidence.pdf
-%%% Local Variables:
-%%% mode: latex
-%%% TeX-master: t
-%%% End:
+http://www.michaelnielsen.org/ddi/how-the-bitcoin-protocol-actually-works/
+http://www.imponderablethings.com/2013/07/how-bitcoin-works-under-hood.html
+
+http://randomwalker.info/bitcoin/
+
+Jeffrey Robinson
+Bitcon: The Naked Truth about Bitcoin
+
+The Bitcoin Backbone Protocol: Analysis and Applications
+https://eprint.iacr.org/2014/765.pdf
+
+Bitcoin book
+http://chimera.labs.oreilly.com/books/1234000001802/ch04.html#public_key_derivation
--- a/handouts/ho08.tex Sat Sep 23 19:39:53 2017 +0100
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,1011 +0,0 @@
-\documentclass{article}
-\usepackage{../style}
-\usepackage{../graphics}
-\usepackage{../langs}
-\usepackage{../data}
-
-%https://crypto.stanford.edu/cs251/
-%https://programmingblockchain.gitbooks.io/programmingblockchain/content/
-
-% bug in smart contracts
-% https://www.benthamsgaze.org/2016/06/17/smart-contracts-beyond-the-age-of-innocence/
-% http://hackingdistributed.com/2016/06/18/analysis-of-the-dao-exploit/
-
-% hard forks
-% https://github.com/bitcoin/bips/blob/master/bip-0050.mediawiki
-
-% only 25% needed to obtain larger shares of mining
-% http://www.cs.cornell.edu/~ie53/publications/btcProcFC.pdf
-
-% re-identification attacks
-% https://cseweb.ucsd.edu/~smeiklejohn/files/imc13.pdf
-
-% bit-coin papers
-% https://crypto.stanford.edu/cs251/syllabus.html
-
-% bit coin talk --- at 20:00 mins
-%https://www.usenix.org/conference/lisa16/conference-program/presentation/perlman
-
-% In fact, far from freeing people from the oppression of the state,
-% blockchains perversely promise the perfect tool for a fully
-% auditable, tax compliant, cashless society. Similarly, the belief it
-% is an anonymous digital cash has quickly vanished and we are now
-% seeing a large number of analytics companies, set-up specifically to
-% work with law enforcement agencies, to police this new parallel
-% financial system.
-%
-% But today blockchain is riddled with
-% contradictions and misunderstandings. Most of its problems are very
-% fixable, if you want to fix them
-
-
-% history of bitcoins
-% https://futurism.com/images/this-week-in-tech-jan-15-22-2016/
-
-\begin{document}
-\fnote{\copyright{} Christian Urban, 2014, 2015}
-
-\section*{Handout 7 (Bitcoins)}
-
-In my opinion Bitcoins are an elaborate Ponzi
-scheme\footnote{\url{http://en.wikipedia.org/wiki/Ponzi_scheme}}---still
-the ideas behind them are really beautiful and not too
-difficult to understand. Since many colourful claims about
-Bitcoins float around in the mainstream and not-so-mainstream
-media, it will be instructive to re-examine such claims from a
-more technically informed vantage point. For example, it is
-often claimed that Bitcoins are anonymous and free from any
-potential government meddling. It turns out that the first
-claim ignores a lot of research in de-anonymising social
-networks, and the second underestimates the persuasive means a
-government has at its disposal.
-
-There are a lot of articles, blogposts, research papers
-etc.~available about Bitcoins. Below I will follow closely the
-very readable explanations from
-
-\begin{center}
-\url{http://www.michaelnielsen.org/ddi/how-the-bitcoin-protocol-actually-works/} \;\;and\smallskip\\
-\url{http://www.imponderablethings.com/2013/07/how-bitcoin-works-under-hood.html}
-\end{center}
-
-\noindent The latter also contains a link to a nice youtube
-video about the technical details behind Bitcoins. I will
-also use some of their pictures.
-
-Let us start with the question who invented Bitcoins? You
-could not make up the answer, but we actually do not know who
-the inventor is. All we know is that the first paper
-
-\begin{center}
-\url{https://bitcoin.org/bitcoin.pdf}
-\end{center}
-
-\noindent is signed by Satoshi Nakamoto, which however is
-likely only a pen name. There is a lot of speculation who
-could be the inventor, or inventors, but we simply do not
-know. This part of Bitcoins is definitely anonymous so far.
-The paper above is from the end of 2008; the first Bitcoin
-transaction was made in January 2009. The rules in Bitcoin are
-set up so that there will only ever be 21 Million Bitcoins
-with the maximum reached around the year 2140. Currently there
-are already 11 Million Bitcoins in `existence'. Contrast this
-with traditional fiat currencies where money can be printed
-almost at will. The smallest unit of a Bitcoin is called a
-Satoshi, which is the $10^{-8}$th part of a Bitcoin. Remember
-a Penny is the $10^{-2}$th part of a Pound.
-
-The two main cryptographic building blocks of Bitcoins are
-cryptographic hashing functions (SHA-256) and public-private
-keys using the elliptic-curve encryption scheme for digital
-signatures. Hashes are used to generate `fingerprints' of data
-that ensure integrity (absence of tampering). Public-private
-keys are used for signatures. For example sending a message,
-say $msg$, together with the encrypted version
-
-\[
-msg, \{msg\}_{K^{priv}}
-\]
-
-\noindent allows everybody with access to the corresponding
-public key $K^{pub}$ to verify that the message came from the
-person who knew the private key. Signatures are used in
-Bitcoins for verifying the addresses where the Bitcoins are
-sent from. Addresses in Bitcoins are essentially the public
-keys. There are $2^{160}$ possible addresses, which is such a
-vast amount that there is not even a check for duplicates, or
-already used addresses. If you start with a random number to
-generate a public-private key pair it is very unlikely that
-you step on somebody else's shoes. Compare this with the
-email-addresses you wanted to register with, say
-Gmail, but which are always already taken.
-
-One major difference between Bitcoins and traditional banking
-is that you do not have a place, or few places, that record the
-balance on your account. Traditional banking involves a
-central ledger which specifies the current balance in each
-account, for example
-
-\begin{center}
-\begin{tabular}{l|r}
-account owner & balance\\\hline
-Alice & \pounds{10.01}\\
-Bob & \pounds{4.99}\\
-Charlie & -\pounds{1.23}\\
-Eve & \pounds{0.00}
-\end{tabular}
-\end{center}
-
-\noindent Bitcoins work differently in that there is no such
-central ledger, but instead a public record of all
-transactions ever made. This means spending money corresponds
-to sending messages of the (oversimplified) form
-
-\begin{equation}
-\{\text{I, Alice, am giving Bob one Bitcoin.}\}_{K^{priv}_{Alice}}
-\end{equation}
-
-\noindent These messages, called transactions, are the only
-data that is ever stored in the Bitcoin system (we will come
-to the precise details later on). The transactions are
-encrypted with Alice's private key so that everybody,
-including Bob, can use Alice's public key $K^{pub}_{Alice}$ to
-verify that this message came really from Alice, or more
-precisely from the person who knows $K^{priv}_{Alice}$.
-
-The problem with such messages in a distributed system is that
-what happens if Bob receives 10, say, of these transactions?
-Did Alice intend to send him 10 Bitcoins, or did the message
-get duplicated by for example an attacker re-playing a sniffed
-message? What is needed is a kind of serial number for such
-transactions. This means transaction messages shoul look more like
-
-\begin{center}
-$\{\text{I, Alice, am giving Bob Bitcoin \#1234567.}\}_{K^{priv}_{Alice}}$
-\end{center}
-
-\noindent There are two difficulties, however, that need to be
-solved with serial numbers. One is who is assigning serial
-numbers to Bitcoins and also how can Bob verify that Alice
-actually owns this Bitcoin to pay him? In a system with a bank
-as trusted third-party, Bob could do the following:
-
-\begin{itemize}
-\item Bob asks the bank whether the Bitcoin with that serial
- number belongs to Alice and Alice hasn't already spent
- this Bitcoin.
-\item If yes, then Bob tells the bank he accepts this Bitcoin.
- The bank updates the records to show that the Bitcoin
- with that serial number is now in Bob’s possession and
- no longer belongs to Alice.
-\end{itemize}
-
-\noindent But for this banks would need to be trusted and
-would also be an easy target for any government interference,
-for example. Think of the early days of music sharing where
-the company Napster was the trusted third-party but also the single point of ``failure'' which
-was taken offline by law enforcement. Bitcoins is more like a
-system such as BitTorrent without a single central entity that
-can be taken offline.\footnote{There is some Bitcoin
-infrastructure that is not so immune from being taken offline:
-for example Bitcoin exchanges, HQs of Bitcoin mining pools,
-Bitcoin developers and so on.}
-
-Bitcoins solve the problem of not being able to rely on a bank
-by making everybody the ``bank''. Everybody who cares can have
-the entire transaction history starting with the first
-transaction made in January 2009. This history of transactions
-is called the \emph{blockchain}. Bob, for example, can use his
-copy of the blockchain for determining whether Alice owned the
-Bitcoin he received, and if she did, he transmits the message
-that he owns it now to every other participant on the Bitcoin
-network. An illustration of a three-block segment of the
-blockchain is (simplified) as follows
-
-\begin{equation}
-\includegraphics[scale=0.4]{../pics/bitcoinblockchain0.png}
-\label{segment}
-\end{equation}
-
-\noindent The chain grows with time. Each block contains a
-list of individual transactions, written txn in the picture
-above, and also a reference to the previous block, written
-prev. The data in a block (txn's and prev) is hashed so that
-the reference and transactions in them cannot be tampered
-with. This hash is also the unique serial number of each
-block. Since this previous-block-reference is also part of the
-hash, the whole chain is robust against tampering. I let you
-think why this is the case?\ldots{}But does it actually
-eliminate all possibilities of fraud?
-
-We can check the consistency of the blockchain by checking
-whether all the references and hashes are correctly recorded.
-I have not tried it myself, but it is said that with the
-current amount of data (appr.~12GB) it takes roughly a day to
-check the consistency of the blockchain on a normal computer.
-Fortunately this ``extended'' consistency check usually only
-needs to be done once. Afterwards the blockchain only needs to
-be updated consistently.
-
-Recall I wrote earlier that Bitcoins do not maintain a ledger,
-which lists all the current balances in each account. Instead
-only transactions are recorded. While a current balance of an
-account is not immediately available, it is possible to
-extract from the blockchain a transaction graph that looks
-like the picture shown in Figure~\ref{txngraph}. Each
-rectangle represents a single transaction. Take for example
-the rightmost lower transaction from Charles to Emily. This
-transaction has as receiver the address of Emily and as the
-sender the address of Charles. In this way no Bitcoins can
-appear out of thin air (we will discuss later how Bitcoins are
-actually generated). If Charles did not have a transaction of
-at least the amount he wants to give Emily to his name
-(i.e.~send to an address with his public-private key) then
-there is no way he can make a payment to Emily. Equally, if
-now Emily wants to pay for a coffee, say, with the Bitcoin she
-received from Charles she can essentially only forward the
-message she received. The only slight complication with this
-setup in Bitcoins is that ``incoming'' Bitcoins can be
-combined in a transaction and ``outgoing'' Bitcoins can be
-split. For example in the leftmost upper transactions in
-Figure~\ref{txngraph}, Fred makes a payment to Alice. But this
-payment (or transaction) combines the Bitcoins that were send
-by Jane to Fred and also by Juan to Fred. This allows you to
-``consolidate'' your funds: if it were only possible to split
-transactions, then the amounts would get smaller and smaller.
-
-In Bitcoins you have the ability to both combine incoming
-transactions, but also to split outgoing transactions to
-potentially more than one receiver. The latter is also needed.
-Consider again the rightmost transactions in
-Figure~\ref{txngraph} and suppose Alice is a coffeeshop owner
-selling coffees for 1 Bitcoin. Charles received a transaction
-from Zack over 5 Bitcoins, say. How does Charles pay for the
-coffee? There is no explicit notion of \emph{change} in the
-Bitcoin system. What Charles has to do instead is to make one
-single transaction with 1 Bitcoin to Alice and with 4 Bitcoins
-going back to himself, which then Charles can use to give to
-Emily, for example.
-
-\begin{figure}[t]
-\begin{center}
-\includegraphics[scale=0.4]{../pics/blockchain.png}
-\end{center}
-\caption{Transaction graph that is implicitly recorded in the
-public blockchain.\label{txngraph}}
-\end{figure}
-
-Let us consider another example. Suppose Emily received 4
-Bitcoins from Charles and independently received another
-transaction (not shown in the picture) that sends 6 Bitcoins
-to her. If she now wants to buy a coffee from Alice for 1
-Bitcoin, she has two possibilities: She could just forward the
-transaction from Charles over 4 Bitcoins to Alice split in
-such a way that Alice receives 1 Bitcoin and Emily sends the
-remaining 3 Bitcoins back to herself. In this case she would
-now be in the possession of two unspend Bitcoin transactions,
-one over 3 Bitcoins and the independent one over 6 Bitcoins.
-Or, Emily could combine both transactions (one over 4 Bitcoins
-from Charles and the independent one over 6 Bitcoins) and then
-split this amount with 1 Bitcoin going to Alice and 9 Bitcoins
-going back to herself.
-
-I think this is a good time for you to pause to let this
-concept of transactions to really sink in\ldots{}You should
-come to the conclusion that there is really no need for a central ledger and no
-need for an account balance as familiar from traditional
-banking. The closest what Bitcoin has to offer for the notion
-of a balance in a bank account are the unspend transactions
-that a person (more precisely a public-private key address)
-received. That means transactions that can still be forwarded.
-
-After the pause also consider the fact that whatever
-transaction is recorded in the blockchain will be in the
-``historical record'' for the Bitcoin system. If a transaction
-says 1 Bitcoin goes from address $A$ to address $B$, then this
-is what will be---$B$ has then the possibility to spend the
-corresponding Bitcoins, whether the transaction was done
-fraudulently or not. There is no exception to this rule.
-Interestingly this is also how Bitcoins can get lost: One
-possibility is that you send Bitcoins to an address for which
-nobody has generated a private key, for example because of a
-typo in the address field---bad luck for fat
-fingers\footnote{\url{http://en.wikipedia.org/wiki/Typographical_error}}
-in the Bitcoin system. The reason is that nobody has a private
-key for this erroneous address and consequently cannot forward
-the transaction anymore. Another possibility is that you
-forget your private key and you had messages forwarded to the
-corresponding public key. Also in this case bad luck: you will
-never be able to forward this message again, because you will
-not be able to form a valid message that sends this to
-somebody else (we will see the details of this later). But
-this is also a way how you can get robbed of your Bitcoins. By
-old-fashioned hacking-into-a-computer crime, for example, an
-attacker might get hold of your private key and then quickly
-forwards the Bitcoins that are in your name to an address the
-attacker controls. You will never again have access to these
-Bitcoins, because for the Bitcoin system they are assumed to
-be spent. And remember with Bitcoins you cannot appeal to any
-higher authority. Once the Bitcoins are gone, they are gone.
-This is much different in traditional banking where at least
-you can try to harass the bank to roll back the transaction.
-
-This brings us to back to problem of double spend. Suppose Bob
-is a merchant. How can he make sure that Alice does not cheat
-him? She could for example send a transaction to Bob. But also
-forward the ``same'' transaction to Charlie, or even herself.
-If Alice manages to get the second transaction into the
-blockchain, Bob will be cheated out of his money. The problem
-in such conflicting situations is how should the network
-update their blockchain? You might end up with a picture like
-this
-
-\begin{center}
-\includegraphics[scale=0.4]{../pics/bitcoindisagreement.png}
-\end{center}
-
-\noindent where Alice convinced some part of the ``world''
-that she is still the owner of the Bitcoin and some other part
-of the ``world'' thinks it's Bob's. How should such a
-disagreement be resolved? This is actually the main hurdle
-where Bitcoin really innovated. The answer is that Bob needs
-to convince ``enough'' people on the network that the
-transaction from Alice to him is legit.
-
-What does, however, ``enough'' mean in a distributed system?
-If Alice sets up a network of a billion, say, puppy identities
-and whenever Bob tries to convince, or validate, that he is
-the rightful owner of the Bitcoin, then the puppy identities
-agree. Bob would then have no reason to not give Alice her
-coffee. But behind his back she has convinced everybody else
-on the network that she is still the rightful owner of the
-Bitcoin. After being outvoted, Bob would be a tad peeved.
-
-The reflex reaction to such a situation would be to make the
-process of validating a transaction as cheap as possible. The
-intention is that Bob will easily get enough peers to agree with him
-that he is the rightful owner. But such a solution has always
-the limitation of Alice setting up an even bigger network of
-puppy identities. The really cool idea of Bitcoin is to go
-into the other direction of making the process of transaction
-validation (artificially) as expensive as possible, but reward
-people for helping with the validation. This is really a novel
-and counterintuitive idea that makes the whole system of
-Bitcoins work so beautifully.
-
-\subsubsection*{Proof-of-Work Puzzles}
-
-In order to make the process of transaction validation
-difficult, Bitcoin uses a kind of puzzle. Solving the puzzles
-is called \emph{Bitcoin mining}, where whoever solves a puzzle
-will be awarded some Bitcoins. At the beginning this was 50
-Bitcoins, but the rules of Bitcoin are set up such that this
-amount halves every 210,000 transactions or so. Currently you
-will be awarded 25 Bitcoins for solving a puzzle. Because the
-amount will halve again and then later again and again, around
-the year 2140 it will go below the level of 1 Satoshi. In that
-event no new Bitcoins will ever be created again and the
-amount of Bitcoins stays fixed. There will be still an
-incentive to help with validating transactions, because there
-is the possibility in Bitcoins to offer a transaction fee to
-whoever solves a puzzle. At the moment this fee is usually set
-to 0, since the incentive for miners is the 25 Bitcoins that
-are currently awarded for solving puzzles.
-
-What do the puzzles that miners have to solve look like? The
-puzzles can be illustrated roughly as follows: Given a string,
-say \code{"Hello, world!"}, what is the salt so that the hash
-starts with a long run of zeros? Let us look at a concrete
-example. Recall that Bitcoins use the hash-function SHA-256.
-Suppose we call this hash function \code{h}, then we could try
-the salt \code{0} as follows:
-
-\begin{quote}
-\code{h("Hello, world!0") =}\\
-\mbox{}\quad\footnotesize\pcode{1312af178c253f84028d480a6adc1e25e81caa44c749ec81976192e2ec934c64}
-\end{quote}
-
-\noindent OK this does not have any zeros at all. We could
-next try the salt \code{1}:
-
-\begin{quote}
-\code{h("Hello, world!1") =}\\
-\mbox{}\quad\footnotesize\pcode{e9afc424b79e4f6ab42d99c81156d3a17228d6e1eef4139be78e948a9332a7d8}
-\end{quote}
-
-\noindent Again this hash value does not contain any leading
-zeros. We could now try out every salt until we reach
-
-\begin{quote}
-\code{h("Hello, world!4250") =}\\
-\mbox{}\quad\footnotesize\pcode{0000c3af42fc31103f1fdc0151fa747ff87349a4714df7cc52ea464e12dcd4e9}
-\end{quote}
-
-\noindent where we have four leading zeros. If four zeros are
-enough, then the puzzle would be solved with this salt. The
-point is that we can very quickly check whether a salt solves
-a puzzle, but it is hard to find one. Latest research suggest
-it is an NP-problem. If we want the output hash value to begin
-with 10 zeroes, say, then we will, on average, need to try
-$16^{10} \approx 10^{12}$ different salts before we find a
-suitable one.
-
-In Bitcoins the puzzles are not solved according to how many
-leading zeros a hash-value has, but rather whether it is below
-a \emph{target}. The hardness of the puzzle can actually be
-controlled by changing the target according to the available
-computational power available. I think the adjustment of the
-hardness of the problems is done every 2060 blocks
-(appr.~every two weeks). The aim of the adjustment is that on
-average the Bitcoin network will most likely solve a puzzle
-within 10 Minutes.
-
-\begin{center}
-\includegraphics[scale=0.37]{../pics/blockchainsolving.png}
-\end{center}
-
-\noindent It could be solved quicker, but equally it could
-take longer, but on average after 10 Minutes somebody on the
-network will have found a solution.
-
-Remember that the puzzles are a kind of proof-of-work that
-make the validation of transactions artificially expensive.
-Consider the following picture with a blockchain and some
-unconfirmed transactions.
-
-\begin{equation}
-\includegraphics[scale=0.38]{../pics/bitcoin_unconfirmed.png}
-\label{unconfirmed}
-\end{equation}
-
-\noindent The puzzle is stated as follows: There are some
-unconfirmed transactions. Choosing some of them, the miner
-(i.e.~the person/computer that tries to solve a puzzle) will
-form a putative block to be added to the blockchain. This
-putative block will contain the transactions and the reference
-to the previous block. The serial number of such a block is
-simply the hash of all the data. The puzzle can then be stated
-as the ``string'' corresponding to the block and which salt is
-needed in order to have the hashed value being below the
-target. Other miners will choose different transactions and
-therefore work on a slightly different putative block and
-puzzle.
-
-The intention of the proof-of-work puzzle is that the
-blockchain is at every given moment linearly ordered, see the
-picture shown in \eqref{unconfirmed}. If we don’t have such a
-linear ordering at any given moment then it may not be clear
-who owns which Bitcoins. Assume a miner David is lucky and
-finds a suitable salt to confirm some transactions. Should he
-celebrate? Not yet. Typically the blockchain will look as
-follows
-
-\begin{center}
-\includegraphics[scale=0.65]{../pics/block_chain1.png}
-\end{center}
-
-\noindent But every so often there will be a fork
-
-\begin{center}
-\includegraphics[scale=0.65]{../pics/block_chain_fork.png}
-\end{center}
-
-\noindent What should be done in this case? Well, the tie is
-broken if another block is solved, like so:
-
-\begin{center}
-\includegraphics[scale=0.4]{../pics/bitcoin_blockchain_branches.png}
-\end{center}
-
-\noindent The rule in Bitcoins is: If a fork occurs, people on
-the network keep track of all forks (they can see). But at any
-given time, miners only work to extend whichever fork is
-longest in their copy of the block chain. Why should miners
-work on the longest fork? Well their incentive is to mine
-Bitcoins. If somebody else already solved a puzzle, then it
-makes more sense to work on a new puzzle and obtain the
-Bitcoins for solving that puzzle, rather than waste efforts on
-a fork that is shorter and therefore less likely to be
-``accepted''. Note that whoever solved a puzzle on the
-``loosing'' fork will actually not get any Bitcoins as reward.
-Tough luck.
-
-
-\subsubsection*{Alice against the Rest of the World}
-
-Let us see how the blockchain and the proof-of-work puzzles
-avoid the problem of double spend. If Alice wants to cheat
-Bob, she would need to pull off the following ploy:
-
-\begin{center}
-\includegraphics[scale=0.4]{../pics/bitcoin_blockchain_double_spend.png}
-\end{center}
-
-\noindent Alice makes a transaction to Bob for paying, for
-example, for an online order. This transaction is confirmed,
-or validated, in block 2. Bob ships the goods around block 4.
-In this moment, Alice needs to get into action and try to
-validate the fraudulent transaction to herself instead. At
-this moment she is in a race against all the computing power
-of the ``rest of the world''. Because the incentive of the
-rest of the world is to work on the longest chain, that is the
-one with the transaction from Alice to Bob:
-
-\begin{center}
-\includegraphics[scale=0.4]{../pics/bitcoin_doublespend_blockchain_race.png}
-\end{center}
-
-\noindent As shown in the picture she has to solve the puzzles
-2a to 5a one after the other, because the hash of a block is
-determined via the reference by all the data in the previous
-block. She might be very lucky to solve one puzzle for a block
-before the rest of the world, but to be lucky many times is
-very unlikely. This principle of having to race against the
-rest of the world avoids the ploy of double spend.
-
-In order to raise the bar for Alice even further, merchants
-accepting Bitcoins use the following rule of thumb: A
-transaction is ``confirmed'' if
-
-\begin{itemize}
-\item[(1)] it is part of a block in the longest fork, and
-\item[(2)] at least 5 blocks follow it in the longest fork. In
- this case we say that the transaction has 6
- confirmations.
-\end{itemize}
-
-\noindent A simple calculation shows that this amount of
-confirmations can take up to 1 hour and more. While this seems
-excessively long, from the merchant's point of view it is not
-that long at all. For this recall that ordinary creditcards
-can have their transactions been rolled-back for 6 months or
-so. The point however is that the odds for Alice being able to
-cheat are very low, unless she can muster more than 50\% of
-the world Bitcoin mining capacity. In this case she could
-out-race the rest of the world. The point is however that
-amassing such an amount of computing power is practically
-impossible for a single person or even a moderately large
-group.
-
-Connected with the 6-confirmation rule is an interesting
-phenomenon. On average, it would take several years for a
-typical computer to solve a proof-of-work puzzle, so an
-individual’s chance of ever solving one before the rest of the
-world, which typically takes only 10 minutes, is negligibly
-low. Therefore many people join groups called \emph{mining
-pools} that collectively work to solve blocks, and distribute
-rewards based on work contributed. These mining pools act
-somewhat like lottery pools among co-workers, except that some
-of these pools are quite large, and comprise more than 20\% of
-all the computers in the network. It is said that BTCC, a
-large mining pool, has limited its number of members in order
-to not solve more than 6 blocks in a row. Otherwise this would
-undermine the trust in Bitcoins, which is also not in the
-interest of BTCC, I guess. Some statistics on mining pools can
-be seen at
-
-\begin{center}
-\url{https://blockchain.info/pools}
-\end{center}
-
-\noindent Here is an interesting problem: You are part of a lottery
-pool, if you chip in some of the money to buy a lottery ticket. In
-this setting it is clear when you are in or outside of the pool. But
-how do you make sure people work hard in a mining pool in order to
-justify a fraction of any reward? If evil me had its way, I would just
-claim I do work and then sit back and relax. Or even if I do some work
-for a mining pool and I happen to find a correct salt, I would keep it
-secret and submit it to the bitcoin network on the ``side''. Actually,
-the idea of mining pools has opened up a full can of interesting
-problems.
-
-
-
-\subsubsection*{Bitcoins for Real}
-
-Let us now turn to the nitty gritty details. As a participant
-in the Bitcoin network you need to generate and store a
-public-private key pair. The public key you need to advertise
-in order to receive payments (transactions). The private key
-needs to be securely stored. For this there seem to be three
-possibilities
-
-\begin{itemize}
-\item an electronic wallet on your computer
-\item a cloud-based storage (offered by some Bitcoin services)
-\item paper-based
-\end{itemize}
-
-\noindent The first two options of course offer convenience
-for making and receiving transactions. But given the nature of
-the private keys and how much security relies on them (recall
-if somebody gets hold of it, your Bitcoins are quickly lost
-forever) I would opt for the third option for anything except
-for trivial amounts of Bitcoins. As we have seen earlier in
-the course, securing a computer system that it can withstand a
-targeted breakin is still very much an unsolved problem.
-
-An interesting fact with Bitcoin keys is that there is no
-check for duplicate addresses. This means when generating a
-public-private key, you should really start with a carefully
-chosen random number such that there is really no chance to
-step on somebody's feet in the $2^{160}$ space of
-possibilities. Again if you share an address with somebody
-else, he or she has access to all your unspend transactions.
-The absence of such a check is easily explained: How would one
-do this in a distributed system? The answer you can't. It is
-possible to do some sanity check of addresses that are already
-used in the blockchain, but this is not a fail-proof method.
-One really has to trust on the enormity of the $2^{160}$
-space for addresses.
-
-Let us now look at the concrete data that is stored in an transaction
-message:
-
-\lstinputlisting[language=Scala]{../slides/msg}
-
-\noindent The hash in Line 1 is the hash of all the data that
-follows. It is a kind of serial number for the transaction.
-Line 2 contains a version number in case there are some
-incompatible changes to be made. Lines 3 and 4 specify how
-many incoming transactions are combined and how many outgoing
-transactions there are. In our example there are one for each.
-Line 5 specifies a lock time for when the transaction is
-supposed to become active---this is usually set to 0 to become
-active immediately. Line 6 specifies the size of the message;
-it has nothing to do with the Bitcoins that are transferred.
-Lines 7 to 11 specify where the Bitcoins in the transaction
-are coming from. The has in line 9 specifies the incoming
-transaction and the \pcode{n} in Line 10 specifies which
-output of the transaction is referred to. The signature in
-line 11 specifies the address (public key $K^{pub}$) from
-where the Bitcoins are taken and the digital signature of the
-address, that is $\{K^{pub}\}_{K^{priv}}$. Lines 12 to 15
-specify the value of the first outgoing transaction. In this
-case 0.319 Bitcoins. The hash in Line 14 specifies the address
-to where the Bitcoins are transferred.
-
-As can be seen there is no need to issue serial numbers for
-transactions, the hash of the transaction data can do this
-job. The hash will contain the sender addresses and
-hash-references to the incoming transactions, as well as the
-public key of the incoming transaction. This uniquely
-identifies a transaction and the hash is the unique
-fingerprint of it. The in-field also contains the address to
-which a earlier transaction is made. The digital signature
-ensures everybody can check that the person who makes this
-transaction is in the possession of the private key. Otherwise
-the signature would not match up with the public-key address.
-
-When mining the blockchain it only needs to be ensured that
-the transactions are consistent (all hashes and signatures
-match up). Then we need to generate the correct previous-block
-link and solve the resulting puzzle. Once the block is
-accepted, everybody can check the integrity of the whole
-blockchain.
-
-A word of warning: The point of a lottery is that some people
-win. But equally, that most people lose. Mining Bitcoins has
-pretty much the same point. According to the article below, a
-very large machine (very, very large in terms of June 2014)
-could potentially mine \$40 worth of Bitcoins a day, but would
-require magnitudes more of electricity costs to do so.
-
-\begin{center}
-\url{http://bitcoinmagazine.com/13774/government-bans-professor-mining-bitcoin-supercomputer/}
-\end{center}
-
-\noindent Bitcoin mining nowadays is only competitive, or
-profitable, if you get the energy for free, or use special
-purpose computing devices.
-
-This about ``free'' energy can actually hurt you very badly in
-unexpected ways. You probably have heard about, or even used,
-Amazon's Elastic Compute Cloud (EC2). Essentially, Amazon is
-selling computing power that you can use to run your web site,
-for example. It is \emph{elastic} in the sense that if you
-have a lot of visitors, you pay a lot, if you have only a few,
-then it is cheap. In order to bill you, you need to set
-up an account with Amazon and receive some secret keys in
-order to authenticate you. The clever (but also dangerous) bit
-is that you upload the code of your web site to GitHub and
-Amazon will pull it from there. You can probably already guess
-where this is going: in order to learn about Amazon's API, it
-gives out some limited computing power for free. Somebody used
-this offer in order to teach himself Ruby on Rails with a
-mildly practical website. Unfortunately, he uploaded also his
-secret keys to GitHub (this is really an easy mistake). Now,
-nasty people crawl GitHub for the purpose of stealing such
-secret keys. What can they do with this? Well, they quickly
-max out the limit of computing power with Amazon and mine
-Bitcoins (under somebody else's account). Fortunately for this
-guy, Amazon was aware of this scam and in a goodwill gesture
-refunded him the money the nasty guys incurred over
-night with their Bitcoin mining. If you want to read the
-complete story, google for ``My \$2375 Amazon EC2 Mistake''.
-
-\subsubsection*{Multi-Signature Transactions}
-
-To be explained.
-
-\subsubsection*{Anonymity with Bitcoins}
-
-One question one often hears is how anonymous is it actually
-to pay with Bitcoins? Paying with paper money used to be a
-quite anonymous act (unlike paying with credit cards, for
-example). But this has changed nowadays: You cannot come to a
-bank anymore with a suitcase full of money and try to open a
-bank account. Strict money laundering and taxation laws mean
-that not even Swiss banks are prepared to take such money and
-open a bank account. That is why Bitcoins are touted as
-filling this niche again of anonymous payments.
-
-While Bitcoins are intended to be anonymous, the reality is
-slightly different. I fully agree with the statement by
-Nielsen from the blog article I referenced at the beginning:
-
-\begin{quote}\it{}``Many people claim that Bitcoin can be used
-anonymously. This claim has led to the formation of
-marketplaces such as Silk Road (and various successors), which
-specialize in illegal goods. However, the claim that Bitcoin
-is anonymous is a myth. The block chain is public, meaning
-that it’s possible for anyone to see every Bitcoin transaction
-ever. Although Bitcoin addresses aren't immediately associated
-to real-world identities, computer scientists have done a
-great deal of work figuring out how to de-anonymise
-`anonymous' social networks. The block chain is a marvellous
-target for these techniques. I will be extremely surprised if
-the great majority of Bitcoin users are not identified with
-relatively high confidence and ease in the near future.''
-\end{quote}
-
-\noindent The only thing I can add to this is that with the Bitcoin
-blockchain we will in the future have even more pleasure hearing
-confessions from reputable or not-so-reputable people, like the
-infamous ``I did not inhale'' from an US
-president.\footnote{\url{www.youtube.com/watch?v=Bktd_Pi4YJw}} The
-whole point of the blockchain is that it public and will always be.
-
-There are some precautions one can take for boosting anonymity, for
-example to use a new public-private key pair for every new
-transaction, and to access Bitcoin only through the Tor network. But
-the transactions in Bitcoins are designed such that they allow one to
-combine incoming transactions. In such cases we know they must have
-been made by the single person who knew the corresponding private
-keys. So using different public-private keys for each transaction
-might not actually make the de-anonymisation task much harder. And the
-point about de-ano\-nymising `anonymous' social networks is that the
-information is embedded into the structure of the transition
-graph. And this cannot be erased with Bitcoins.
-
-One paper that has fun with spotting transactions made to Silk Road (2.0)
-and also to Wikileaks is
-
-\begin{center}
-\url{http://people.csail.mit.edu/spillai/data/papers/bitcoin-transaction-graph-analysis.pdf}
-\end{center}
-
-\noindent
-A paper that gathers some statistical data about the blockchain is
-
-\begin{center}
-\url{https://eprint.iacr.org/2012/584.pdf}
-\end{center}
-
-\subsubsection*{Government Meddling}
-
-Finally, what are the options for a typical Western government to
-meddle with Bitcoins? This is of course one feature the proponents of
-Bitcoins also tout: namely that there aren't any options. In my
-opinion this is far too naive and far from the truth. Let us assume
-some law enforcement agencies would not have been able to uncover the
-baddies from Silk Road 1.0 and 2.0 (they have done so by uncovering
-the Tor network, which is an incredible feat on its own). Would the
-government in question have stopped? I do not think so. The next
-target would have been Bitcoin. If I were the government, this is
-what I would consider:
-
-\begin{itemize}
-\item The government could compel ``mayor players'' to blacklist
- Bitcoins (for example at Bitcoin exchanges, which are usually
- located somewhere in the vicinity of the government's reach). This
- would impinge on what is called \emph{fungibility} of Bitcoins and
- make them much less attractive to baddies. Suddenly their
- ``hard-earned'' Bitcoin money cannot be spent anymore. The attraction
- of this option is that this blacklisting can be easily done
- ``whole-sale'' and therefore be really be an attractive target for
- governments \& Co.
-\item The government could attempt to coerce the developer
- community of the Bitcoin tools. While this might be a
- bit harder, we know certain governments are ready to
- take such actions (we have seen this with Lavabit, just
- that the developers there refused to play ball and shut
- down their complete operation).
-\item The government could also put pressure on mining pools
- in order to blacklist transactions from baddies. Or be a
- big miner itself. Given the gigantic facilities that
- are built for institutions like the NSA (pictures from
- the Utah dessert)
-
- \begin{center}
- \includegraphics[scale=0.04]{../pics/nsautah1.jpg}
- \hspace{3mm}
- \includegraphics[scale=0.031]{../pics/nsautah2.jpg}
- \end{center}
-
- this would not be such a high bar to jump over. Remember it
- ``only'' takes to be temporarily in control of 50\%-plus of the
- mining capacity in order to undermine the trust in the
- system. Given sophisticated stories like Stuxnet (where we still
- do not know the precise details) maybe even such large
- facilities are not really needed. What happens, for example, if
- a government starts DoS attacks on existing miners? They have
- complete control (unfortunately) of all mayor connectivity
- providers, i.e.~ISPs.
-
- There are estimates that the Bitcoin mining capacity
- outperforms the top 500 supercomputers in the world,
- combined(!):
-
- \begin{center}\small
- \url{http://www.forbes.com/sites/reuvencohen/2013/11/28/global-bitcoin-computing-power-now-256-times-faster-than-top-500-supercomputers-combined/}
- \end{center}
-
- But my gut feeling is that these are too simplistic
- calculations. In security (and things like Bitcoins) the
- world is never just black and white. The point is once
- the trust is undermined, the Bitcoin system would need
- to be evolved to Bitcoins 2.0. But who says that Bitcoin
- 2.0 will honour the Bitcoins from Version 1.0?
- \end{itemize}
-
-\noindent A government would potentially not really
-need to follow up with such threads. Just the rumour that it
-would, could be enough to get the Bitcoin-house-of-cards to
-tumble. Some governments have already such an ``impressive''
-trackrecord in this area, such a thread would be entirely
-credible. Because of all this, I would not have too much hope
-that Bitcoins are free from interference by governments \& Co when
-it will stand in their way, despite what everybody else is
-saying. To sum up, the technical details behind Bitcoins are
-simply cool. But still the entire Bitcoin ecosystem is in my
-humble opinion rather fragile.
-
-
-\subsubsection*{Isn't there anything good with Bitcoins?}
-
-As you can see, so far my argument was that yes the Bitcoin system is
-based on a lot of very cool technical ideas, but otherwise it is a big
-scam. You might wonder if there is not something good (in terms of
-valuable for civilisation) in the bitcoin system? I think there is
-actually: diamonds are quite valuable and because of this can be
-used as a form of `money'---just remember the song with the line
-`diamonds are forever'.
-
-The problem with diamonds is that in some places where they are found,
-they also fund some stupid wars. You like to set up a usable system
-whereby you can check whether a diamond comes from a reputable source
-(not funding any wars) or from a dodgy source. For this you have to
-know that `clearing houses' for diamonds can engrave with lasers
-unique numbers inside the diamonds. These engravings are invisible to
-the naked eye and as far as I remember these numbers cannot be removed,
-except by destroying the diamond. Even if it can be removes, diamonds
-without the number cannot (hopefully) be sold.
-
-How do bitcoins come into the picture? The idea is called
-\emph{coloured coins}, where you attach some additional information to
-some `coins'. In the diamond example the bitcoin transactions are
-supposed to act as a certificate where diamonds are from (reputable
-sources or not). For this you have to know that you can attach a very
-short custom-made message with each bitcoin transaction. So you would
-record the diamond number inside the message.
-
-Now, you would set the system up so that a trusted entity (which
-exists in the diamond world) buys with their public key bitcoins (or
-smaller amounts). These trusted entities are essentially the places
-that also cut the raw diamonds. The idea is whenever you buy a
-diamond, you like to have also the corresponding bitcoin
-transaction. If you want to sell the diamond, you make a transaction
-to the new owner. The new owner will ask for this message, because
-otherwise he/she cannot sell it later on.
-
-The advantage is that for each diamond you can trace back that the
-transaction must have originated from the trusted entity. If yes, your
-diamond will be sellable. If you do not have the message, the diamond
-comes from a dodgy source and will (hopefully) not be sellable later
-on. In this way you skew the incentives such that only legitimate
-diamond are of value. The bitcoin system just helps with being able to
-check whether the message originates from the trusted entity or
-not....you do not have to consult anybody else and pay money for this
-consultation. Or in any way reveal your identity by such a consultation
-(the police might just keep a particularly close an eye on who contacts
-such a clearing house).
-
-Since we hopefully all agree that funding stupid wars is bad, any
-system that can starve funds for such wars must be good. Piggy-bagging
-on the trust established by the bitcoin system on the public block chain
-makes such a system realisable.
-
-\subsubsection*{Further Reading}
-
-Finally, finally, the article
-
-\begin{center}\small
-\url{http://www.extremetech.com/extreme/155636-the-bitcoin-network-outperforms-the-top-500-supercomputers-combined}
-\end{center}
-
-\noindent makes an interesting point: If people are willing to
-solve meaningless puzzles for hard, cold cash and with this
-achieve rather impressive results, what could we achieve if
-the UN, say, would find the money and incentivise people to,
-for example, solve protein folding
-puzzles?\footnote{\url{http://en.wikipedia.org/wiki/Protein_folding}}
-For this there are projects like
-Folding@home.\footnote{\url{http://folding.stanford.edu}}
-This might help with curing diseases such as Alzheimer or
-diabetes. The same point is made in the article
-
-\begin{center}\small
-\url{http://gizmodo.com/the-worlds-most-powerful-computer-network-is-being-was-504503726}
-\end{center}
-
-A definitely interesting and worthy use of Bitcoins has been explored
-in the thesis
-
-\begin{center}
-\url{http://enetium.com/resources/Thesis.pdf}
-\end{center}
-
-\noindent where the author proposes ways of publishing information
-that is censor-resistant as part of the blockchain. The idea is that
-if a government wants to use Bitcoins, it would also have to put up
-with plain-text data that can be included in a transaction.
-
-Ken Shirrif in his blog at
-
-\begin{center}\small
-\url{http://www.righto.com/2014/02/bitcoin-mining-hard-way-algorithms.html}
-\end{center}
-
-\noindent writes that every day the electricity consumption of mining
-for bitcoins is roughly 15 Mega Watts---the energy consumption of a country
-like Cambodia. He writes:
-
-\begin{quote}
- \it{}``The difficulty of mining a block is astounding. At the
- current difficulty, the chance of a hash succeeding is a bit less
- than one in $10^{19}$. Finding a successful hash is harder than
- finding a particular grain of sand from all the grains of sand on
- Earth. To find a hash every ten minutes, the Bitcoin hash rate needs
- to be insanely large. Currently, the miners on the Bitcoin network
- are doing about 25 million gigahashes per second. That is, every
- second about 25,000,000,000,000,000 blocks gets hashed. I estimate
- (very roughly) that the total hardware used for Bitcoin mining cost
- tens of millions of dollars and uses as much power as the country of
- Cambodia.''
-\end{quote}
-
-\end{document}
-
-bit coin
-
-A fistful of bitcoins
-http://cseweb.ucsd.edu/~smeiklejohn/files/imc13.pdf
-http://cseweb.ucsd.edu/~smeiklejohn/files/imc13.pdf
-
-Ross Anderson & Co (no dispute resolution; co-ercion)
-http://www.cl.cam.ac.uk/~sjm217/papers/fc14evidence.pdf
-
-http://www.michaelnielsen.org/ddi/how-the-bitcoin-protocol-actually-works/
-http://www.imponderablethings.com/2013/07/how-bitcoin-works-under-hood.html
-
-http://randomwalker.info/bitcoin/
-
-Jeffrey Robinson
-Bitcon: The Naked Truth about Bitcoin
-
-The Bitcoin Backbone Protocol: Analysis and Applications
-https://eprint.iacr.org/2014/765.pdf
-
-Bitcoin book
-http://chimera.labs.oreilly.com/books/1234000001802/ch04.html#public_key_derivation
Binary file handouts/ho09.pdf has changed
--- a/handouts/ho09.tex Sat Sep 23 19:39:53 2017 +0100
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,615 +0,0 @@
-\documentclass{article}
-\usepackage{../style}
-\usepackage{../langs}
-\usepackage{../graphics}
-\usepackage{../grammar}
-\usepackage{multicol}
-
-\newcommand{\dn}{\stackrel{\mbox{\scriptsize def}}{=}}
-
-\begin{document}
-\fnote{\copyright{} Christian Urban, King's College London, 2014}
-
-%% why are shuttle flights so good with software
-%%http://www.fastcompany.com/28121/they-write-right-stuff
-
-\section*{Handout 9 (Static Analysis)}
-
-If we want to improve the safety and security of our programs,
-we need a more principled approach to programming. Testing is
-good, but as Edsger Dijkstra famously wrote:
-
-\begin{quote}\it
-``Program testing can be a very effective way to show the
-\underline{\smash{presence}} of bugs, but it is hopelessly
-inadequate for showing their \underline{\smash{absence}}.''
-\end{quote}
-
-\noindent While such a more principled approach has been the
-subject of intense study for a long, long time, only in the
-past few years some impressive results have been achieved. One
-is the complete formalisation and (mathematical) verification
-of a microkernel operating system called seL4.
-
-\begin{center}
-\url{http://sel4.systems}
-\end{center}
-
-\noindent In 2011 this work was included in the MIT Technology
-Review in the annual list of the world’s ten most important
-emerging
-technologies.\footnote{\url{http://www2.technologyreview.com/tr10/?year=2011}}
-While this work is impressive, its technical details are too
-enormous for an explanation here. Therefore let us look at
-something much simpler, namely finding out properties about
-programs using \emph{static analysis}.
-
-Static analysis is a technique that checks properties of a
-program without actually running the program. This should
-raise alarm bells with you---because almost all interesting
-properties about programs are equivalent to the halting
-problem, which we know is undecidable. For example estimating
-the memory consumption of programs is in general undecidable,
-just like the halting problem. Static analysis circumvents
-this undecidability-problem by essentially allowing answers
-\emph{yes} and \emph{no}, but also \emph{don't know}. With
-this ``trick'' even the halting problem becomes
-decidable\ldots{}for example we could always say \emph{don't
-know}. Of course this would be silly. The point is that we
-should be striving for a method that answers as often as
-possible either \emph{yes} or \emph{no}---just in cases when
-it is too difficult we fall back on the
-\emph{don't-know}-answer. This might sound all like abstract
-nonsense. Therefore let us look at a concrete example.
-
-
-\subsubsection*{A Simple, Idealised Programming Language}
-
-Our starting point is a small, idealised programming language.
-It is idealised because we cut several corners in comparison
-with real programming languages. The language we will study
-contains, amongst other things, variables holding integers.
-Using static analysis, we want to find out what the sign of
-these integers (positive or negative) will be when the program
-runs. This sign-analysis seems like a very simple problem. But
-even such simple problems, if approached naively, are in
-general undecidable, just like Turing's halting problem. I let
-you think why?
-
-
-Is sign-analysis of variables an interesting problem? Well,
-yes---if a compiler can find out that for example a variable
-will never be negative and this variable is used as an index
-for an array, then the compiler does not need to generate code
-for an underflow-check. Remember some languages are immune to
-buffer-overflow attacks, but they need to add underflow and
-overflow checks everywhere. According to John Regher, an
-expert in the field of compilers, overflow checks can cause
-5-10\% slowdown, in some languages even 100\% for tight
-loops.\footnote{\url{http://blog.regehr.org/archives/1154}} If
-the compiler can omit the underflow check, for example, then
-this can potentially drastically speed up the generated code.
-
-What do programs in our simple programming language look like?
-The following grammar gives a first specification:
-
-\begin{multicols}{2}
-\begin{plstx}[rhs style=,one per line,left margin=9mm]
-: \meta{Stmt} ::= \meta{label} \texttt{:}
- | \meta{var} \texttt{:=} \meta{Exp}
- | \texttt{jmp?} \meta{Exp} \meta{label}
- | \texttt{goto} \meta{label}\\
-: \meta{Prog} ::= \meta{Stmt} \ldots{} \meta{Stmt}\\
-\end{plstx}
-\columnbreak
-\begin{plstx}[rhs style=,one per line]
: \meta{Exp} ::= \meta{Exp} \texttt{+} \meta{Exp}
- | \meta{Exp} \texttt{*} \meta{Exp}
- | \meta{Exp} \texttt{=} \meta{Exp}
- | \meta{num}
- | \meta{var}\\
-\end{plstx}
-\end{multicols}
-
-\noindent I assume you are familiar with such
-grammars.\footnote{\url{http://en.wikipedia.org/wiki/Backus–Naur_Form}}
-There are three main syntactic categories: \emph{statments}
-and \emph{expressions} as well as \emph{programs}, which are
-sequences of statements. Statements are either labels,
-variable assignments, conditional jumps (\pcode{jmp?}) and
-unconditional jumps (\pcode{goto}). Labels are just strings,
-which can be used as the target of a jump. We assume that in
-every program the labels are unique---if there is a clash,
-then we do not know where to jump to. The conditional jumps
-and variable assignments involve (arithmetic) expressions.
-Expressions are either numbers, variables or compound
-expressions built up from \pcode{+}, \pcode{*} and \emph{=}
-(for simplicity reasons we do not consider any other
-operations). We assume we have negative and positive numbers,
-\ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1},
-\pcode{2}\ldots{} An example program that calculates the
-factorial of 5 is in our programming language as follows:
-
-\begin{lstlisting}[language={},xleftmargin=10mm]
- a := 1
- n := 5
-top:
- jmp? n = 0 done
- a := a * n
- n := n + -1
- goto top
-done:
-\end{lstlisting}
-
-\noindent As can be seen each line of the program contains a
-statement. In the first two lines we assign values to the
-variables \pcode{a} and \pcode{n}. In line 4 we test whether
-\pcode{n} is zero, in which case we jump to the end of the
-program marked with the label \pcode{done}. If \pcode{n} is
-not zero, we multiply the content of \pcode{a} by \pcode{n},
-decrease \pcode{n} by one and jump back to the beginning of
-the loop, marked with the label \pcode{top}. Another program
-in our language is shown in Figure~\ref{fib}. I let you think
-what it calculates.
-
-\begin{figure}[t]
-\begin{lstlisting}[numbers=none,
- language={},xleftmargin=10mm]
- n := 6
- m1 := 0
- m2 := 1
-loop:
- jmp? n = 0 done
- tmp := m2
- m2 := m1 + m2
- m1 := tmp
- n := n + -1
- goto top
-done:
-\end{lstlisting}
-\caption{A mystery program in our idealised programming language.
-Try to find out what it calculates! \label{fib}}
-\end{figure}
-
-Even if our language is rather small, it is still Turing
-complete---meaning quite powerful. However, discussing this
-fact in more detail would lead us too far astray. Clearly, our
-programming is rather low-level and not very comfortable for
-writing programs. It is inspired by real machine code, which
-is the code that is executed by a CPU. So a more interesting
-question is what is missing in comparison with real machine
-code? Well, not much\ldots{}in principle. Real machine code,
-of course, contains many more arithmetic instructions (not
-just addition and multiplication) and many more conditional
-jumps. We could add these to our language if we wanted, but
-complexity is really beside the point here. Furthermore, real
-machine code has many instructions for manipulating memory. We
-do not have this at all. This is actually a more serious
-simplification because we assume numbers to be arbitrary small
-or large, which is not the case with real machine code. In
-real machine code, basic number formats have a range and might
-over-flow or under-flow this range. Also the number of
-variables in our programs is potentially unlimited, while
-memory in an actual computer, of course, is always limited. To
-sum up, our language might look ridiculously simple, but it is
-not too far removed from practically relevant issues.
-
-
-\subsubsection*{An Interpreter}
-
-Designing a language is like playing god: you can say what
-names for variables you allow; what programs should look like;
-most importantly you can decide what each part of the program
-should mean and do. While our language is quite simple and the
-meaning of statements, for example, is rather straightforward,
-there are still places where we need to make real choices. For
-example consider the conditional jumps, say the one in the
-factorial program:
-
-\begin{center}
-\code{jmp? n = 0 done}
-\end{center}
-
-\noindent How should they work? We could introduce Booleans
-(\pcode{true} and \pcode{false}) and then jump only when the
-condition is \pcode{true}. However, since we have numbers in
-our language anyway, why not just encoding \pcode{true} as
-one, and \pcode{false} as zero? In this way we can dispense
-with the additional concept of Booleans.
-
-I hope the above discussion makes it already clear we need to
-be a bit more careful with our programs. Below we shall
-describe an interpreter for our programming language, which
-specifies exactly how programs are supposed to be
-run\ldots{}at least we will specify this for all \emph{good}
-programs. By good programs I mean where all variables are
-initialised, for example. Our interpreter will just crash if
-it cannot find out the value for a variable when it is not
-initialised. Also, we will assume that labels in good programs
-are unique, otherwise our programs will calculate ``garbage''.
-
-First we will pre-process our programs. This will simplify the
-definition of our interpreter later on. By pre-processing our
-programs we will transform programs into \emph{snippets}. A
-snippet is a label and all the code that comes after the
-label. This essentially means a snippet is a \emph{map} from
-labels to code.\footnote{Be sure you know what maps are. In a
-programming context they are often represented as association
-list where some data is associated with a key.}
-
-Given that programs are sequences (or lists) of statements, we
-can easily calculate the snippets by just traversing this
-sequence and recursively generating the map. Suppose a program
-is of the general form
-
-\begin{center}
-$stmt_1\;stmt_2\; \ldots\; stmt_n$
-\end{center}
-
-\noindent The idea is to go through this sequence of
-statements one by one and check whether they are a label. If
-yes, we add the label and the remaining statements to our map.
-If no, we just continue with the next statement. To come up
-with a recursive definition for generating snippets, let us
-write $[]$ for the program that does not contain any
-statement. Consider the following definition:
-
-\begin{center}
-\begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l}
-$\textit{snippets}([])$ & $\dn$ & $\varnothing$\\
-$\textit{snippets}(stmt\;\; rest)$ & $\dn$ &
-$\begin{cases}
- \textit{snippets}(rest)[label := rest] & \text{if}\;stmt = \textit{label:}\\
- \textit{snippets}(rest) & \text{otherwise}
-\end{cases}$
-\end{tabular}
-\end{center}
-
-\noindent In the first clause we just return the empty map for
-the program that does not contain any statement. In the second
-clause, we have to distinguish the case where the first
-statement is a label or not. As said before, if not, then we
-just ``throw away'' the label and recursively calculate the
-snippets for the rest of the program (the otherwise clause).
-If yes, then we do the same, but also update the map so that
-$label$ now points to the rest of the statements (the if
-clause). This looks all realtively straightforward, but there
-is one small problem we need to overcome: our two programs
-shown so far have no label as \emph{entry point}---that is
-where the execution is supposed to start. We usually assume
-that the first statement will be run first. To make this the
-default, it is convenient if we add to all our programs a
-default label, say \pcode{""} (the empty string). With this we
-can define our pre-processing of programs as follows
-
-\begin{center}
-$\textit{preproc}(prog) \dn \textit{snippets}(\pcode{"":}\;\; prog)$
-\end{center}
-
-\noindent Let us see how this pans out in practice. If we
-pre-process the factorial program shown earlier, we obtain the
-following map:
-
-\begin{center}
-\small
-\lstset{numbers=none,
- language={},
- xleftmargin=0mm,
- aboveskip=0.5mm,
- belowskip=0.5mm,
- frame=single,
- framerule=0mm,
- framesep=0mm}
-\begin{tikzpicture}[node distance=0mm]
-\node (A1) [draw]{\pcode{""}};
-\node (B1) [right=of A1] {$\mapsto$};
-\node (C1) [right=of B1,anchor=north west] {
-\begin{minipage}{3.5cm}
-\begin{lstlisting}[language={},xleftmargin=0mm]
- a := 1
- n := 5
-top:
- jmp? n = 0 done
- a := a * n
- n := n + -1
- goto top
-done:
-\end{lstlisting}
-\end{minipage}};
-\node (A2) [right=of C1.north east,draw] {\pcode{top}};
-\node (B2) [right=of A2] {$\mapsto$};
-\node (C2) [right=of B2, anchor=north west] {
-\begin{minipage}{3.5cm}
-\begin{lstlisting}[language={},xleftmargin=0mm]
- jmp? n = 0 done
- a := a * n
- n := n + -1
- goto top
-done:
-\end{lstlisting}
-\end{minipage}};
-\node (A3) [right=of C2.north east,draw] {\pcode{done}};
-\node (B3) [right=of A3] {$\mapsto$};
-\node (C3) [right=of B3] {$[]$};
-\end{tikzpicture}
-\end{center}
-
-\noindent I highlighted the \emph{keys} in this map. Since
-there are three labels in the factorial program (remember we
-added \pcode{""}), there are three keys. When running the
-factorial program and encountering a jump, then we only have
-to consult this snippets-map in order to find out what the
-next statements should be.
-
-We should now be in the position to define how a program
-should be run. In the context of interpreters, this
-``running'' of programs is often called \emph{evaluation}. Let
-us start with the definition of how expressions are evaluated.
-A first attempt might be the following recursive function:
-
-\begin{center}
-\begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l}
-$\textit{eval\_exp}(\texttt{n})$ & $\dn$ & $n$
- \qquad\text{if}\; \texttt{n}\; \text{is a number like}
- \ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1},
-\pcode{2}\ldots{}\\
-$\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{+}\,
- \texttt{e}_\texttt{2})$ &
- $\dn$ & $\textit{eval\_exp}(\texttt{e}_\texttt{1}) +
- \textit{eval\_exp}(\texttt{e}_\texttt{2})$\\
-$\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{*}\,
- \texttt{e}_\texttt{2})$ &
- $\dn$ & $\textit{eval\_exp}(\texttt{e}_\texttt{1}) *
- \textit{eval\_exp}(\texttt{e}_\texttt{2})$\\
-$\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{=}\,
- \texttt{e}_\texttt{2})$ &
- $\dn$ & $\begin{cases}
- 1 & \text{if}\;\textit{eval\_exp}(\texttt{e}_\texttt{1}) =
- \textit{eval\_exp}(\texttt{e}_\texttt{2})\\
- 0 & \text{otherwise}
- \end{cases}$
-\end{tabular}
-\end{center}
-
-\noindent While this should look all rather intuitive`, still
-be very careful. There is a subtlety which can be easily
-overlooked: The function \textit{eval\_exp} takes an
-expression of our programming language as input and returns a
-number as output. Therefore whenever we encounter a number in
-our program, we just return this number---this is defined in
-the first clause above. Whenever we encounter an addition,
-well then we first evaluate the left-hand side
-$\texttt{e}_\texttt{1}$ of the addition (this will give a
-number), then evaluate the right-hand side
-$\texttt{e}_\texttt{2}$ (this gives another number), and
-finally add both numbers together. Here is the subtlety: on
-the left-hand side of the $\dn$ we have a \texttt{+} (in the
-teletype font) which is the symbol for addition in our
-programming language. On the right-hand side we have $+$ which
-stands for the arithmetic operation from ``mathematics'' of
-adding two numbers. These are rather different concepts---one
-is a symbol (which we made up), and the other a mathematical
-operation. When we will have a look at an actual
-implementation of our interpreter, the mathematical operation
-will be the function for addition from the programming
-language in which we \underline{\smash{implement}} our
-interpreter. While the \texttt{+} is just a symbol that is
-used in our programming language. Clearly we have to use a
-symbol that is a good mnemonic for addition, otherwise we will
-confuse the programmers working with our language. Therefore
-we use $\texttt{+}$. A similar choice is made for times in the
-third clause and equality in the fourth clause.
-
-Remember I wrote at the beginning of this section about being
-god when designing a programming language. You can see this
-here: we need to give meaning to symbols. At the moment
-however, we are a poor fallible god. Look again at the grammar
-of our programming language and our definition. Clearly, an
-expression can contain variables. So far we have ignored them.
-What should our interpreter do with variables? They might
-change during the evaluation of a program. For example the
-variable \pcode{n} in the factorial program counts down from 5
-down to 0. How can we improve our definition above to give also
-an answer whenever our interpreter encounters a variable in an
-expression? The solution is to add an \emph{environment},
-written $env$, as an additional input argument to our
-\textit{eval\_exp} function.
-
-\begin{center}
-\begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l}
-$\textit{eval\_exp}(\texttt{n}, env)$ & $\dn$ & $n$
- \qquad\text{if}\; \texttt{n}\; \text{is a number like}
- \ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1},
-\pcode{2}\ldots{}\\
-$\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{+}\,
- \texttt{e}_\texttt{2}, env)$ &
- $\dn$ & $\textit{eval\_exp}(\texttt{e}_\texttt{1}, env) +
- \textit{eval\_exp}(\texttt{e}_\texttt{2}, env)$\\
-$\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{*}\,
- \texttt{e}_\texttt{2}, env)$ &
- $\dn$ & $\textit{eval\_exp}(\texttt{e}_\texttt{1}, env) *
- \textit{eval\_exp}(\texttt{e}_\texttt{2}, env)$\\
-$\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{=}\,
- \texttt{e}_\texttt{2}, env)$ &
- $\dn$ & $\begin{cases}
- 1 & \text{if}\;\textit{eval\_exp}(\texttt{e}_\texttt{1}, env) =
- \textit{eval\_exp}(\texttt{e}_\texttt{2}, env)\\
- 0 & \text{otherwise}
- \end{cases}$\\
-$\textit{eval\_exp}(\texttt{x}, env)$ & $\dn$ & $env(x)$
-\end{tabular}
-\end{center}
-
-\noindent This environment $env$ also acts like a map: it
-associates variables with their current values. For example
-after evaluating the first two lines in our factorial
-program, such an environment might look as follows
-
-\begin{center}
-\begin{tabular}{ll}
-$\fbox{\texttt{a}} \mapsto \texttt{1}$ &
-$\fbox{\texttt{n}} \mapsto \texttt{5}$
-\end{tabular}
-\end{center}
-
-\noindent Again I highlighted the keys. In the clause for
-variables, we can therefore consult this environment and
-return whatever value is currently stored for this variable.
-This is written $env(x)$---meaning we query this map with $x$
-we obtain the corresponding number. You might ask what happens
-if an environment does not contain any value for, say, the
-variable $x$? Well, then our interpreter just ``crashes'', or
-more precisely will raise an exception. In this case we have a
-``bad'' program that tried to use a variable before it was
-initialised. The programmer should not have done this. In a
-real programming language, we would of course try a bit harder
-and for example give an error at compile time, or design our
-language in such a way that this can never happen. With the
-second version of \textit{eval\_exp} we completed our
-definition for evaluating expressions.
-
-Next comes the evaluation function for statements. We define
-this function in such a way that we recursively evaluate a
-whole sequence of statements. Assume a program $p$ (you want
-to evaluate) and its pre-processed snippets $sn$. Then we can
-define:
-
-\begin{center}
-\begin{tabular}{@{}l@{\hspace{1mm}}c@{\hspace{1mm}}l@{}}
-$\textit{eval\_stmts}([], env)$ & $\dn$ & $env$\\
-$\textit{eval\_stmts}(\texttt{label:}\;rest, env)$ &
- $\dn$ & $\textit{eval\_stmts}(rest, env)$ \\
-$\textit{eval\_stmts}(\texttt{x\,:=\,e}\;rest, env)$ &
- $\dn$ & $\textit{eval\_stmts}(rest,
- env[x := \textit{eval\_exp}(\texttt{e}, env)])$\\
-$\textit{eval\_stmts}(\texttt{goto\,lbl}\;rest, env)$
- & $\dn$ & $\textit{eval\_stmts}(sn(\texttt{lbl}), env)$\\
-$\textit{eval\_stmts}(\texttt{jmp?\,e\,lbl}\;rest, env)$
- & $\dn$ & $\begin{cases}\begin{array}{@{}l@{\hspace{-12mm}}r@{}}
- \textit{eval\_stmts}(sn(\texttt{lbl}), env)\\
- & \text{if}\;\textit{eval\_exp}(\texttt{e}, env) = 1\\
- \textit{eval\_stmts}(rest, env) & \text{otherwise}\\
- \end{array}\end{cases}$
-\end{tabular}
-\end{center}
-
-\noindent The first clause is for the empty program, or when
-we arrived at the end of the program. In this case we just
-return the environment. The second clause is for when the next
-statement is a label. That means the program is of the form
-$\texttt{label:}\;rest$ where the label is some string and
-$rest$ stands for all following statements. This case is easy,
-because our evaluation function just discards the label and
-evaluates the rest of the statements (we already extracted all
-important information about labels when we pre-processed our
-programs and generated the snippets). The third clause is for
-variable assignments. Again we just evaluate the rest for the
-statements, but with a modified environment---since the
-variable assignment is supposed to introduce a new variable or
-change the current value of a variable. For this modification
-of the environment we first evaluate the expression
-$\texttt{e}$ using our evaluation function for expressions.
-This gives us a number. Then we assign this number to the
-variable $x$ in the environment. This modified environment
-will be used to evaluate the rest of the program. The fourth
-clause is for the unconditional jump to a label, called
-\texttt{lbl}. That means we have to look up in our snippets
-map $sn$ what are the next statements for this label.
-Therefore we will continue with evaluating, not with the rest
-of the program, but with the statements stored in the
-snippets-map under the label $\texttt{lbl}$. The fifth clause
-for conditional jumps is similar, but to decide whether to
-make the jump we first need to evaluate the expression
-$\texttt{e}$ in order to find out whether it is $1$. If yes,
-we jump, otherwise we just continue with evaluating the rest
-of the program.
-
-Our interpreter works in two stages: First we pre-process our
-program generating the snippets map $sn$, say. Second we call
-the evaluation function with the default entry point and the
-empty environment:
-
-\begin{center}
-$\textit{eval\_stmts}(sn(\pcode{""}), \varnothing)$
-\end{center}
-
-\noindent It is interesting to note that our interpreter when
-it comes to the end of the program returns an environment. Our
-programming language does not contain any constructs for input
-and output. Therefore this environment is the only effect we
-can observe when running the program (apart from that our
-interpreter might need some time before finishing the
-evaluation of the program and the CPU getting hot). Evaluating
-the factorial program with our interpreter we receive as
-``answer''-environment
-
-\begin{center}
-\begin{tabular}{ll}
-$\fbox{\texttt{a}} \mapsto \texttt{120}$ &
-$\fbox{\texttt{n}} \mapsto \texttt{0}$
-\end{tabular}
-\end{center}
-
-\noindent While the discussion above should have illustrated
-the ideas, in order to do some serious calculations, we clearly
-need to implement the interpreter.
-
-
-\subsubsection*{Scala Code for the Interpreter}
-
-Functional programming languages are very convenient for
-implementations of interpreters. A good choice for a
-functional programming language is Scala, a programming
-language that combines functional and object-oriented
-pro\-gramming-styles. It has received in the last five years or
-so quite a bit of attention. One reason for this attention is
-that, like the Java programming language, Scala compiles to
-the Java Virtual Machine (JVM) and therefore Scala programs
-can run under MacOSX, Linux and Windows.\footnote{There are
-also experimental backends for Android and JavaScript.} Unlike
-Java, however, Scala often allows programmers to write very
-concise and elegant code. Some therefore say Scala is the much
-better Java. A number of companies, The Guardian, Twitter,
-Coursera, FourSquare, LinkedIn to name a few, either use Scala
-exclusively in production code, or at least to some
-substantial degree. If you want to try out Scala yourself, the
-Scala compiler can be downloaded from
-
-\begin{quote}
-\url{http://www.scala-lang.org}
-\end{quote}
-
-Let us have a look at the Scala code shown in
-Figure~\ref{code}. It shows the entire code for the
-interpreter, though the implementation is admittedly no
-frills.
-
-\begin{figure}[t]
-\small
-\lstinputlisting[language=Scala]{../progs/inter.scala}
-\caption{The entire code of the interpreter for our
-idealised programming language.\label{code}}
-\end{figure}
-
-
-\subsubsection*{Static Analysis}
-
-Finally we can come back to our original problem, namely
-finding out what the signs of variables are
-
-\begin{center}
-
-
-\end{center}
-
-\end{document}
-
-%% list of static analysers for C
-http://spinroot.com/static/index.html
-
-%% NASA coding rules for C
-http://pixelscommander.com/wp-content/uploads/2014/12/P10.pdf
-
-%%% Local Variables:
-%%% mode: latex
-%%% TeX-master: t
-%%% End:
Binary file handouts/ho98-sa.pdf has changed
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/handouts/ho98-sa.tex Sat Sep 23 19:52:27 2017 +0100
@@ -0,0 +1,615 @@
+\documentclass{article}
+\usepackage{../style}
+\usepackage{../langs}
+\usepackage{../graphics}
+\usepackage{../grammar}
+\usepackage{multicol}
+
+\newcommand{\dn}{\stackrel{\mbox{\scriptsize def}}{=}}
+
+\begin{document}
+\fnote{\copyright{} Christian Urban, King's College London, 2014}
+
+%% why are shuttle flights so good with software
+%%http://www.fastcompany.com/28121/they-write-right-stuff
+
+\section*{Handout 9 (Static Analysis)}
+
+If we want to improve the safety and security of our programs,
+we need a more principled approach to programming. Testing is
+good, but as Edsger Dijkstra famously wrote:
+
+\begin{quote}\it
+``Program testing can be a very effective way to show the
+\underline{\smash{presence}} of bugs, but it is hopelessly
+inadequate for showing their \underline{\smash{absence}}.''
+\end{quote}
+
+\noindent While such a more principled approach has been the
+subject of intense study for a long, long time, only in the
+past few years some impressive results have been achieved. One
+is the complete formalisation and (mathematical) verification
+of a microkernel operating system called seL4.
+
+\begin{center}
+\url{http://sel4.systems}
+\end{center}
+
+\noindent In 2011 this work was included in the MIT Technology
+Review in the annual list of the world’s ten most important
+emerging
+technologies.\footnote{\url{http://www2.technologyreview.com/tr10/?year=2011}}
+While this work is impressive, its technical details are too
+enormous for an explanation here. Therefore let us look at
+something much simpler, namely finding out properties about
+programs using \emph{static analysis}.
+
+Static analysis is a technique that checks properties of a
+program without actually running the program. This should
+raise alarm bells with you---because almost all interesting
+properties about programs are equivalent to the halting
+problem, which we know is undecidable. For example estimating
+the memory consumption of programs is in general undecidable,
+just like the halting problem. Static analysis circumvents
+this undecidability-problem by essentially allowing answers
+\emph{yes} and \emph{no}, but also \emph{don't know}. With
+this ``trick'' even the halting problem becomes
+decidable\ldots{}for example we could always say \emph{don't
+know}. Of course this would be silly. The point is that we
+should be striving for a method that answers as often as
+possible either \emph{yes} or \emph{no}---just in cases when
+it is too difficult we fall back on the
+\emph{don't-know}-answer. This might sound all like abstract
+nonsense. Therefore let us look at a concrete example.
+
+
+\subsubsection*{A Simple, Idealised Programming Language}
+
+Our starting point is a small, idealised programming language.
+It is idealised because we cut several corners in comparison
+with real programming languages. The language we will study
+contains, amongst other things, variables holding integers.
+Using static analysis, we want to find out what the sign of
+these integers (positive or negative) will be when the program
+runs. This sign-analysis seems like a very simple problem. But
+even such simple problems, if approached naively, are in
+general undecidable, just like Turing's halting problem. I let
+you think why?
+
+
+Is sign-analysis of variables an interesting problem? Well,
+yes---if a compiler can find out that for example a variable
+will never be negative and this variable is used as an index
+for an array, then the compiler does not need to generate code
+for an underflow-check. Remember some languages are immune to
+buffer-overflow attacks, but they need to add underflow and
+overflow checks everywhere. According to John Regher, an
+expert in the field of compilers, overflow checks can cause
+5-10\% slowdown, in some languages even 100\% for tight
+loops.\footnote{\url{http://blog.regehr.org/archives/1154}} If
+the compiler can omit the underflow check, for example, then
+this can potentially drastically speed up the generated code.
+
+What do programs in our simple programming language look like?
+The following grammar gives a first specification:
+
+\begin{multicols}{2}
+\begin{plstx}[rhs style=,one per line,left margin=9mm]
+: \meta{Stmt} ::= \meta{label} \texttt{:}
+ | \meta{var} \texttt{:=} \meta{Exp}
+ | \texttt{jmp?} \meta{Exp} \meta{label}
+ | \texttt{goto} \meta{label}\\
+: \meta{Prog} ::= \meta{Stmt} \ldots{} \meta{Stmt}\\
+\end{plstx}
+\columnbreak
+\begin{plstx}[rhs style=,one per line]
: \meta{Exp} ::= \meta{Exp} \texttt{+} \meta{Exp}
+ | \meta{Exp} \texttt{*} \meta{Exp}
+ | \meta{Exp} \texttt{=} \meta{Exp}
+ | \meta{num}
+ | \meta{var}\\
+\end{plstx}
+\end{multicols}
+
+\noindent I assume you are familiar with such
+grammars.\footnote{\url{http://en.wikipedia.org/wiki/Backus–Naur_Form}}
+There are three main syntactic categories: \emph{statments}
+and \emph{expressions} as well as \emph{programs}, which are
+sequences of statements. Statements are either labels,
+variable assignments, conditional jumps (\pcode{jmp?}) and
+unconditional jumps (\pcode{goto}). Labels are just strings,
+which can be used as the target of a jump. We assume that in
+every program the labels are unique---if there is a clash,
+then we do not know where to jump to. The conditional jumps
+and variable assignments involve (arithmetic) expressions.
+Expressions are either numbers, variables or compound
+expressions built up from \pcode{+}, \pcode{*} and \emph{=}
+(for simplicity reasons we do not consider any other
+operations). We assume we have negative and positive numbers,
+\ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1},
+\pcode{2}\ldots{} An example program that calculates the
+factorial of 5 is in our programming language as follows:
+
+\begin{lstlisting}[language={},xleftmargin=10mm]
+ a := 1
+ n := 5
+top:
+ jmp? n = 0 done
+ a := a * n
+ n := n + -1
+ goto top
+done:
+\end{lstlisting}
+
+\noindent As can be seen each line of the program contains a
+statement. In the first two lines we assign values to the
+variables \pcode{a} and \pcode{n}. In line 4 we test whether
+\pcode{n} is zero, in which case we jump to the end of the
+program marked with the label \pcode{done}. If \pcode{n} is
+not zero, we multiply the content of \pcode{a} by \pcode{n},
+decrease \pcode{n} by one and jump back to the beginning of
+the loop, marked with the label \pcode{top}. Another program
+in our language is shown in Figure~\ref{fib}. I let you think
+what it calculates.
+
+\begin{figure}[t]
+\begin{lstlisting}[numbers=none,
+ language={},xleftmargin=10mm]
+ n := 6
+ m1 := 0
+ m2 := 1
+loop:
+ jmp? n = 0 done
+ tmp := m2
+ m2 := m1 + m2
+ m1 := tmp
+ n := n + -1
+ goto top
+done:
+\end{lstlisting}
+\caption{A mystery program in our idealised programming language.
+Try to find out what it calculates! \label{fib}}
+\end{figure}
+
+Even if our language is rather small, it is still Turing
+complete---meaning quite powerful. However, discussing this
+fact in more detail would lead us too far astray. Clearly, our
+programming is rather low-level and not very comfortable for
+writing programs. It is inspired by real machine code, which
+is the code that is executed by a CPU. So a more interesting
+question is what is missing in comparison with real machine
+code? Well, not much\ldots{}in principle. Real machine code,
+of course, contains many more arithmetic instructions (not
+just addition and multiplication) and many more conditional
+jumps. We could add these to our language if we wanted, but
+complexity is really beside the point here. Furthermore, real
+machine code has many instructions for manipulating memory. We
+do not have this at all. This is actually a more serious
+simplification because we assume numbers to be arbitrary small
+or large, which is not the case with real machine code. In
+real machine code, basic number formats have a range and might
+over-flow or under-flow this range. Also the number of
+variables in our programs is potentially unlimited, while
+memory in an actual computer, of course, is always limited. To
+sum up, our language might look ridiculously simple, but it is
+not too far removed from practically relevant issues.
+
+
+\subsubsection*{An Interpreter}
+
+Designing a language is like playing god: you can say what
+names for variables you allow; what programs should look like;
+most importantly you can decide what each part of the program
+should mean and do. While our language is quite simple and the
+meaning of statements, for example, is rather straightforward,
+there are still places where we need to make real choices. For
+example consider the conditional jumps, say the one in the
+factorial program:
+
+\begin{center}
+\code{jmp? n = 0 done}
+\end{center}
+
+\noindent How should they work? We could introduce Booleans
+(\pcode{true} and \pcode{false}) and then jump only when the
+condition is \pcode{true}. However, since we have numbers in
+our language anyway, why not just encoding \pcode{true} as
+one, and \pcode{false} as zero? In this way we can dispense
+with the additional concept of Booleans.
+
+I hope the above discussion makes it already clear we need to
+be a bit more careful with our programs. Below we shall
+describe an interpreter for our programming language, which
+specifies exactly how programs are supposed to be
+run\ldots{}at least we will specify this for all \emph{good}
+programs. By good programs I mean where all variables are
+initialised, for example. Our interpreter will just crash if
+it cannot find out the value for a variable when it is not
+initialised. Also, we will assume that labels in good programs
+are unique, otherwise our programs will calculate ``garbage''.
+
+First we will pre-process our programs. This will simplify the
+definition of our interpreter later on. By pre-processing our
+programs we will transform programs into \emph{snippets}. A
+snippet is a label and all the code that comes after the
+label. This essentially means a snippet is a \emph{map} from
+labels to code.\footnote{Be sure you know what maps are. In a
+programming context they are often represented as association
+list where some data is associated with a key.}
+
+Given that programs are sequences (or lists) of statements, we
+can easily calculate the snippets by just traversing this
+sequence and recursively generating the map. Suppose a program
+is of the general form
+
+\begin{center}
+$stmt_1\;stmt_2\; \ldots\; stmt_n$
+\end{center}
+
+\noindent The idea is to go through this sequence of
+statements one by one and check whether they are a label. If
+yes, we add the label and the remaining statements to our map.
+If no, we just continue with the next statement. To come up
+with a recursive definition for generating snippets, let us
+write $[]$ for the program that does not contain any
+statement. Consider the following definition:
+
+\begin{center}
+\begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l}
+$\textit{snippets}([])$ & $\dn$ & $\varnothing$\\
+$\textit{snippets}(stmt\;\; rest)$ & $\dn$ &
+$\begin{cases}
+ \textit{snippets}(rest)[label := rest] & \text{if}\;stmt = \textit{label:}\\
+ \textit{snippets}(rest) & \text{otherwise}
+\end{cases}$
+\end{tabular}
+\end{center}
+
+\noindent In the first clause we just return the empty map for
+the program that does not contain any statement. In the second
+clause, we have to distinguish the case where the first
+statement is a label or not. As said before, if not, then we
+just ``throw away'' the label and recursively calculate the
+snippets for the rest of the program (the otherwise clause).
+If yes, then we do the same, but also update the map so that
+$label$ now points to the rest of the statements (the if
+clause). This looks all realtively straightforward, but there
+is one small problem we need to overcome: our two programs
+shown so far have no label as \emph{entry point}---that is
+where the execution is supposed to start. We usually assume
+that the first statement will be run first. To make this the
+default, it is convenient if we add to all our programs a
+default label, say \pcode{""} (the empty string). With this we
+can define our pre-processing of programs as follows
+
+\begin{center}
+$\textit{preproc}(prog) \dn \textit{snippets}(\pcode{"":}\;\; prog)$
+\end{center}
+
+\noindent Let us see how this pans out in practice. If we
+pre-process the factorial program shown earlier, we obtain the
+following map:
+
+\begin{center}
+\small
+\lstset{numbers=none,
+ language={},
+ xleftmargin=0mm,
+ aboveskip=0.5mm,
+ belowskip=0.5mm,
+ frame=single,
+ framerule=0mm,
+ framesep=0mm}
+\begin{tikzpicture}[node distance=0mm]
+\node (A1) [draw]{\pcode{""}};
+\node (B1) [right=of A1] {$\mapsto$};
+\node (C1) [right=of B1,anchor=north west] {
+\begin{minipage}{3.5cm}
+\begin{lstlisting}[language={},xleftmargin=0mm]
+ a := 1
+ n := 5
+top:
+ jmp? n = 0 done
+ a := a * n
+ n := n + -1
+ goto top
+done:
+\end{lstlisting}
+\end{minipage}};
+\node (A2) [right=of C1.north east,draw] {\pcode{top}};
+\node (B2) [right=of A2] {$\mapsto$};
+\node (C2) [right=of B2, anchor=north west] {
+\begin{minipage}{3.5cm}
+\begin{lstlisting}[language={},xleftmargin=0mm]
+ jmp? n = 0 done
+ a := a * n
+ n := n + -1
+ goto top
+done:
+\end{lstlisting}
+\end{minipage}};
+\node (A3) [right=of C2.north east,draw] {\pcode{done}};
+\node (B3) [right=of A3] {$\mapsto$};
+\node (C3) [right=of B3] {$[]$};
+\end{tikzpicture}
+\end{center}
+
+\noindent I highlighted the \emph{keys} in this map. Since
+there are three labels in the factorial program (remember we
+added \pcode{""}), there are three keys. When running the
+factorial program and encountering a jump, then we only have
+to consult this snippets-map in order to find out what the
+next statements should be.
+
+We should now be in the position to define how a program
+should be run. In the context of interpreters, this
+``running'' of programs is often called \emph{evaluation}. Let
+us start with the definition of how expressions are evaluated.
+A first attempt might be the following recursive function:
+
+\begin{center}
+\begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l}
+$\textit{eval\_exp}(\texttt{n})$ & $\dn$ & $n$
+ \qquad\text{if}\; \texttt{n}\; \text{is a number like}
+ \ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1},
+\pcode{2}\ldots{}\\
+$\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{+}\,
+ \texttt{e}_\texttt{2})$ &
+ $\dn$ & $\textit{eval\_exp}(\texttt{e}_\texttt{1}) +
+ \textit{eval\_exp}(\texttt{e}_\texttt{2})$\\
+$\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{*}\,
+ \texttt{e}_\texttt{2})$ &
+ $\dn$ & $\textit{eval\_exp}(\texttt{e}_\texttt{1}) *
+ \textit{eval\_exp}(\texttt{e}_\texttt{2})$\\
+$\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{=}\,
+ \texttt{e}_\texttt{2})$ &
+ $\dn$ & $\begin{cases}
+ 1 & \text{if}\;\textit{eval\_exp}(\texttt{e}_\texttt{1}) =
+ \textit{eval\_exp}(\texttt{e}_\texttt{2})\\
+ 0 & \text{otherwise}
+ \end{cases}$
+\end{tabular}
+\end{center}
+
+\noindent While this should look all rather intuitive`, still
+be very careful. There is a subtlety which can be easily
+overlooked: The function \textit{eval\_exp} takes an
+expression of our programming language as input and returns a
+number as output. Therefore whenever we encounter a number in
+our program, we just return this number---this is defined in
+the first clause above. Whenever we encounter an addition,
+well then we first evaluate the left-hand side
+$\texttt{e}_\texttt{1}$ of the addition (this will give a
+number), then evaluate the right-hand side
+$\texttt{e}_\texttt{2}$ (this gives another number), and
+finally add both numbers together. Here is the subtlety: on
+the left-hand side of the $\dn$ we have a \texttt{+} (in the
+teletype font) which is the symbol for addition in our
+programming language. On the right-hand side we have $+$ which
+stands for the arithmetic operation from ``mathematics'' of
+adding two numbers. These are rather different concepts---one
+is a symbol (which we made up), and the other a mathematical
+operation. When we will have a look at an actual
+implementation of our interpreter, the mathematical operation
+will be the function for addition from the programming
+language in which we \underline{\smash{implement}} our
+interpreter. While the \texttt{+} is just a symbol that is
+used in our programming language. Clearly we have to use a
+symbol that is a good mnemonic for addition, otherwise we will
+confuse the programmers working with our language. Therefore
+we use $\texttt{+}$. A similar choice is made for times in the
+third clause and equality in the fourth clause.
+
+Remember I wrote at the beginning of this section about being
+god when designing a programming language. You can see this
+here: we need to give meaning to symbols. At the moment
+however, we are a poor fallible god. Look again at the grammar
+of our programming language and our definition. Clearly, an
+expression can contain variables. So far we have ignored them.
+What should our interpreter do with variables? They might
+change during the evaluation of a program. For example the
+variable \pcode{n} in the factorial program counts down from 5
+down to 0. How can we improve our definition above to give also
+an answer whenever our interpreter encounters a variable in an
+expression? The solution is to add an \emph{environment},
+written $env$, as an additional input argument to our
+\textit{eval\_exp} function.
+
+\begin{center}
+\begin{tabular}{l@{\hspace{1mm}}c@{\hspace{1mm}}l}
+$\textit{eval\_exp}(\texttt{n}, env)$ & $\dn$ & $n$
+ \qquad\text{if}\; \texttt{n}\; \text{is a number like}
+ \ldots \pcode{-2}, \pcode{-1}, \pcode{0}, \pcode{1},
+\pcode{2}\ldots{}\\
+$\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{+}\,
+ \texttt{e}_\texttt{2}, env)$ &
+ $\dn$ & $\textit{eval\_exp}(\texttt{e}_\texttt{1}, env) +
+ \textit{eval\_exp}(\texttt{e}_\texttt{2}, env)$\\
+$\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{*}\,
+ \texttt{e}_\texttt{2}, env)$ &
+ $\dn$ & $\textit{eval\_exp}(\texttt{e}_\texttt{1}, env) *
+ \textit{eval\_exp}(\texttt{e}_\texttt{2}, env)$\\
+$\textit{eval\_exp}(\texttt{e}_\texttt{1} \,\texttt{=}\,
+ \texttt{e}_\texttt{2}, env)$ &
+ $\dn$ & $\begin{cases}
+ 1 & \text{if}\;\textit{eval\_exp}(\texttt{e}_\texttt{1}, env) =
+ \textit{eval\_exp}(\texttt{e}_\texttt{2}, env)\\
+ 0 & \text{otherwise}
+ \end{cases}$\\
+$\textit{eval\_exp}(\texttt{x}, env)$ & $\dn$ & $env(x)$
+\end{tabular}
+\end{center}
+
+\noindent This environment $env$ also acts like a map: it
+associates variables with their current values. For example
+after evaluating the first two lines in our factorial
+program, such an environment might look as follows
+
+\begin{center}
+\begin{tabular}{ll}
+$\fbox{\texttt{a}} \mapsto \texttt{1}$ &
+$\fbox{\texttt{n}} \mapsto \texttt{5}$
+\end{tabular}
+\end{center}
+
+\noindent Again I highlighted the keys. In the clause for
+variables, we can therefore consult this environment and
+return whatever value is currently stored for this variable.
+This is written $env(x)$---meaning we query this map with $x$
+we obtain the corresponding number. You might ask what happens
+if an environment does not contain any value for, say, the
+variable $x$? Well, then our interpreter just ``crashes'', or
+more precisely will raise an exception. In this case we have a
+``bad'' program that tried to use a variable before it was
+initialised. The programmer should not have done this. In a
+real programming language, we would of course try a bit harder
+and for example give an error at compile time, or design our
+language in such a way that this can never happen. With the
+second version of \textit{eval\_exp} we completed our
+definition for evaluating expressions.
+
+Next comes the evaluation function for statements. We define
+this function in such a way that we recursively evaluate a
+whole sequence of statements. Assume a program $p$ (you want
+to evaluate) and its pre-processed snippets $sn$. Then we can
+define:
+
+\begin{center}
+\begin{tabular}{@{}l@{\hspace{1mm}}c@{\hspace{1mm}}l@{}}
+$\textit{eval\_stmts}([], env)$ & $\dn$ & $env$\\
+$\textit{eval\_stmts}(\texttt{label:}\;rest, env)$ &
+ $\dn$ & $\textit{eval\_stmts}(rest, env)$ \\
+$\textit{eval\_stmts}(\texttt{x\,:=\,e}\;rest, env)$ &
+ $\dn$ & $\textit{eval\_stmts}(rest,
+ env[x := \textit{eval\_exp}(\texttt{e}, env)])$\\
+$\textit{eval\_stmts}(\texttt{goto\,lbl}\;rest, env)$
+ & $\dn$ & $\textit{eval\_stmts}(sn(\texttt{lbl}), env)$\\
+$\textit{eval\_stmts}(\texttt{jmp?\,e\,lbl}\;rest, env)$
+ & $\dn$ & $\begin{cases}\begin{array}{@{}l@{\hspace{-12mm}}r@{}}
+ \textit{eval\_stmts}(sn(\texttt{lbl}), env)\\
+ & \text{if}\;\textit{eval\_exp}(\texttt{e}, env) = 1\\
+ \textit{eval\_stmts}(rest, env) & \text{otherwise}\\
+ \end{array}\end{cases}$
+\end{tabular}
+\end{center}
+
+\noindent The first clause is for the empty program, or when
+we arrived at the end of the program. In this case we just
+return the environment. The second clause is for when the next
+statement is a label. That means the program is of the form
+$\texttt{label:}\;rest$ where the label is some string and
+$rest$ stands for all following statements. This case is easy,
+because our evaluation function just discards the label and
+evaluates the rest of the statements (we already extracted all
+important information about labels when we pre-processed our
+programs and generated the snippets). The third clause is for
+variable assignments. Again we just evaluate the rest for the
+statements, but with a modified environment---since the
+variable assignment is supposed to introduce a new variable or
+change the current value of a variable. For this modification
+of the environment we first evaluate the expression
+$\texttt{e}$ using our evaluation function for expressions.
+This gives us a number. Then we assign this number to the
+variable $x$ in the environment. This modified environment
+will be used to evaluate the rest of the program. The fourth
+clause is for the unconditional jump to a label, called
+\texttt{lbl}. That means we have to look up in our snippets
+map $sn$ what are the next statements for this label.
+Therefore we will continue with evaluating, not with the rest
+of the program, but with the statements stored in the
+snippets-map under the label $\texttt{lbl}$. The fifth clause
+for conditional jumps is similar, but to decide whether to
+make the jump we first need to evaluate the expression
+$\texttt{e}$ in order to find out whether it is $1$. If yes,
+we jump, otherwise we just continue with evaluating the rest
+of the program.
+
+Our interpreter works in two stages: First we pre-process our
+program generating the snippets map $sn$, say. Second we call
+the evaluation function with the default entry point and the
+empty environment:
+
+\begin{center}
+$\textit{eval\_stmts}(sn(\pcode{""}), \varnothing)$
+\end{center}
+
+\noindent It is interesting to note that our interpreter when
+it comes to the end of the program returns an environment. Our
+programming language does not contain any constructs for input
+and output. Therefore this environment is the only effect we
+can observe when running the program (apart from that our
+interpreter might need some time before finishing the
+evaluation of the program and the CPU getting hot). Evaluating
+the factorial program with our interpreter we receive as
+``answer''-environment
+
+\begin{center}
+\begin{tabular}{ll}
+$\fbox{\texttt{a}} \mapsto \texttt{120}$ &
+$\fbox{\texttt{n}} \mapsto \texttt{0}$
+\end{tabular}
+\end{center}
+
+\noindent While the discussion above should have illustrated
+the ideas, in order to do some serious calculations, we clearly
+need to implement the interpreter.
+
+
+\subsubsection*{Scala Code for the Interpreter}
+
+Functional programming languages are very convenient for
+implementations of interpreters. A good choice for a
+functional programming language is Scala, a programming
+language that combines functional and object-oriented
+pro\-gramming-styles. It has received in the last five years or
+so quite a bit of attention. One reason for this attention is
+that, like the Java programming language, Scala compiles to
+the Java Virtual Machine (JVM) and therefore Scala programs
+can run under MacOSX, Linux and Windows.\footnote{There are
+also experimental backends for Android and JavaScript.} Unlike
+Java, however, Scala often allows programmers to write very
+concise and elegant code. Some therefore say Scala is the much
+better Java. A number of companies, The Guardian, Twitter,
+Coursera, FourSquare, LinkedIn to name a few, either use Scala
+exclusively in production code, or at least to some
+substantial degree. If you want to try out Scala yourself, the
+Scala compiler can be downloaded from
+
+\begin{quote}
+\url{http://www.scala-lang.org}
+\end{quote}
+
+Let us have a look at the Scala code shown in
+Figure~\ref{code}. It shows the entire code for the
+interpreter, though the implementation is admittedly no
+frills.
+
+\begin{figure}[t]
+\small
+\lstinputlisting[language=Scala]{../progs/inter.scala}
+\caption{The entire code of the interpreter for our
+idealised programming language.\label{code}}
+\end{figure}
+
+
+\subsubsection*{Static Analysis}
+
+Finally we can come back to our original problem, namely
+finding out what the signs of variables are
+
+\begin{center}
+
+
+\end{center}
+
+\end{document}
+
+%% list of static analysers for C
+http://spinroot.com/static/index.html
+
+%% NASA coding rules for C
+http://pixelscommander.com/wp-content/uploads/2014/12/P10.pdf
+
+%%% Local Variables:
+%%% mode: latex
+%%% TeX-master: t
+%%% End:
Binary file handouts/ho99-zkp.pdf has changed
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/handouts/ho99-zkp.tex Sat Sep 23 19:52:27 2017 +0100
@@ -0,0 +1,622 @@
+\documentclass{article}
+\usepackage{../style}
+
+\begin{document}
+\fnote{\copyright{} Christian Urban, King's College London, 2014, 2015}
+
+\section*{Handout 6 (Zero-Knowledge Proofs)}
+
+Zero-knowledge proofs (short ZKP) solve a paradoxical puzzle:
+How to convince somebody else that one knows a secret, without
+revealing what the secret actually is? This sounds like a
+problem the Mad Hatter from Alice in Wonderland would occupy
+himself with, but actually there some serious and not so
+serious applications of it. For example, if you solve
+crosswords with your friend, say Bob, you might want to
+convince him that you found a solution for one question, but
+of course you do not want to reveal the solution, as this
+might give Bob an advantage somewhere else in the crossword.
+
+So how to convince Bob that you know the answer (or a secret)?
+One way would be to come up with the following protocol:
+Suppose the answer is \textit{folio}. Then look up the
+definition of \textit{folio} in a dictionary. Say you find:
+
+\begin{quote}
+``an \textit{individual} leaf of paper or parchment, either
+loose as one of a series or forming part of a bound volume,
+which is numbered on the recto or front side only.''
+\end{quote}
+
+\noindent
+Take the first non-small word\footnote{Let's say the, a, an,
+or, and \ldots fall into the category of small words.} in this definition,
+in this case \textit{individual}, and look up the definition
+of this word, say
+
+\begin{quote}
+``a single \textit{human} being as distinct from a group''
+\end{quote}
+
+\noindent
+In this definition take the second non-small word, that
+is \textit{human}, and again look up the definition of this
+word. This will yield
+
+\begin{quote}
+``relating to or \textit{characteristic} of humankind''
+\end{quote}
+
+\noindent
+You could go on looking up the definition of the third
+non-small word in this definition and so on. But let us assume
+you agreed with Bob to stop after three iterations with the
+third non-article word in the last definition, that is
+\textit{or}. Now, instead of sending to Bob the solution
+\textit{folio}, you send to him \textit{characteristic}.
+
+How can Bob verify that you know the solution? Well, once he
+solved it himself, he can use the dictionary and follow the
+same ``trail'' as you did. If the final word agrees with what
+you had sent him, he must infer you knew the solution earlier
+than him. This protocol works like a one-way hash function
+because \textit{characteristic} does not give any hint as to
+what was the first word was. I leave you to think why this
+protocol avoids small words?
+
+After Bob found his solution and verified that according to
+the protocol it ``maps'' also to \textit{characteristic}, can
+he be entirely sure no cheating is going on? Not with 100\%
+certainty. It could have been possible that he was given
+\textit{characteristic} as the word, but it derived from a
+different word. This might seem very unlikely, but at least
+theoretical it is a possibility. Protocols based on
+zero-knowledge proofs will produce a similar result---they
+give an answer that might be erroneous in a very small number
+of cases. The point is to iterate them long enough so that the
+theoretical possibility of cheating is negligibly small.
+
+By the way, the authors of the paper ``Dismantling Megamos
+Crypto: Wirelessly Lockpicking a Vehicle Immobilizer'' who
+were barred from publishing their results used also a hash to
+prove they did the work and (presumably) managed to get into
+cars without a key; see Figure~\ref{paper}. This is very
+similar to the method above about crosswords: They like to
+prove that they did the work, but not giving out the
+``solution''. But this also shows what the problem with such a
+method is: yes, we can hide the secret temporarily, but if
+somebody else wants to verify it, then the secret has to be
+made public. Bob needs to know that \textit{folio} is the
+solution before he can verify the claim of Alice that she had
+the solution first. Similarly with the car-crypto paper: we
+needed to wait until September 2015 when the authors were
+finally able to publish their findings in order to verify the
+hash. Zero-knowledge proofs, in contrast, can be immediately
+checked, even if the secret is not public yet and perhaps
+never will be.
+
+\begin{figure}
+\begin{center}
+\addtolength{\fboxsep}{4mm}
+\fbox{\includegraphics[scale=0.4]{../pics/Dismantling_Megamos_Crypto.png}}
+\end{center}
+\caption{The authors of this paper used a hash in order to prove
+later that they have managed to break into cars.\label{paper}}
+\end{figure}
+
+
+\subsubsection*{ZKP: An Illustrative Example}
+
+The idea behind zero-knowledge proofs is not very obvious and
+will surely take some time for you to digest. Therefore let us
+start with a simple illustrative example. The example will not
+be perfect, but hopefully explain the gist of the idea. The
+example is called Alibaba's cave, which graphically looks as
+follows:\footnote{The example is taken from an
+article titled ``How to Explain Zero-Knowledge Protocols
+to Your Children'' available from
+\url{http://pages.cs.wisc.edu/~mkowalcz/628.pdf}.}
+
+\begin{center}
+\begin{tabular}{c@{\hspace{8mm}}c@{\hspace{8mm}}c}
+\includegraphics[scale=0.1]{../pics/alibaba1.png} &
+\includegraphics[scale=0.1]{../pics/alibaba2.png} &
+\includegraphics[scale=0.1]{../pics/alibaba3.png} \\
+Step 1 & Step 2 & Step 3
+\end{tabular}
+\end{center}
+
+\noindent Let us take a closer look at the picture in Step 1:
+The cave has a tunnel which forks at some point. Both forks
+are connected in a loop. At the deep end of the loop is a
+magic wand. The point of the magic wand is that Alice knows
+the secret word for how to open it. She wants to keep the word
+secret, but wants to convince Bob that she knows it.
+
+One way of course would be to let Bob follow her, but then he
+would also find out the secret. So this does not work. To find
+a solution, let us first fix the rules: At the beginning Alice
+and Bob are outside the cave. Alice goes in alone and takes
+either tunnel labelled $A$ in the picture, or the other tunnel
+labelled $B$. She waits at the magic wand for the instructions
+from Bob, who also goes into the gave and observes what
+happens at the fork. He has no knowledge which tunnel Alice
+took and calls out (in Step 2) that she should emerge from tunnel
+$A$, for example. If she knows the secret for opening the
+wand, this will not be a problem for Alice. If she was already
+in the $A$-segment of the tunnel, then she just comes back. If
+she was in the $B$-segment of the tunnel she will say the magic
+word and goes through the wand to emerge from $A$ as requested
+by Bob.
+
+Let us have a look at the case where Alice cheats, that is not
+knows the secret. She would still go into the cave and use,
+for example the $B$-segment of the tunnel. If now Bob says she
+should emerge from $B$, she is lucky. But if he says she
+should emerge from $A$ then Alice is in trouble: Bob will find
+out she does not actually know the secret. So in order to fool
+Bob she needs to anticipate his call, and already go into the
+corresponding tunnel. This of course also does not work, since
+Bob can make any call he wants. Consequently in order to find
+out whether Alice cheats, Bob just needs to repeat this
+protocol many times. Each time Alice has a chance of
+$\frac{1}{2}$ to be lucky or being found out. Iterating this
+$n$ times means she must be right every time and when
+cheating: the probability for this is $\frac{1}{2}^n$, number
+that for already relatively small $n$, say 10, is incredibly
+small.
+
+
+There are some interesting observations we can make about
+Alibaba's cave and the ZKP protocol between Alice and Bob:
+
+\begin{itemize}
+\item {\bf Completeness} If Alice knows the secret, Bob
+ accepts Alice ``proof'' for sure. There is no error
+ possible in that Bob thinks Alice cheats when she
+ actually knows the secret.
+\item {\bf Soundness} If Alice does not know the secret,
+ Bob accepts her ``proof'' with a very small probability.
+ If, as in the example above, the probability of being
+ able to hide cheating is $\frac{1}{2}$ in each round
+ it will be $\frac{1}{2}^n$ after $n$-rounds, which even
+ for small $n$ say $> 10$ is very small indeed.
+\item {\bf Zero-Knowledge} Even if Bob accepts
+ the proof by Alice, he cannot convince anybody else.
+\end{itemize}
+
+\noindent The last property is the most interesting one.
+Assume Alice has convinced Bob that she knows the secret and
+Bob filmed the whole protocol with a camera. Can he use the
+video to convince anybody else? After a moment of thought, you
+will agree that this is not the case. Alice and Bob might have
+just made it all up and colluded by Bob telling Alice
+beforehand which tunnel he will call. In this way it appears
+as if all iterations of the protocol were successful, but they
+prove nothing. If another person wants to find out whether
+Alice knows the secret, he or she would have to conduct the
+protocol again. This is actually the formal definition of a
+zero-knowledge proof: an independent observer cannot
+distinguish between a real protocol (where Alice knows the
+secret) and a fake one (where Bob and Alice colluded).
+
+
+\subsubsection*{Using an Graph-Isomorphism Problem for ZKPs}
+
+Now the question is how can we translate Alibaba's cave into a
+computer science solution? It turns out we need an NP problem
+for that. The main feature of an NP problem is that it is
+computational very hard to generate a solution, but it is very
+easy to check whether a given solution indeed solves the
+problem at hand.\footnote{The question whether $P = NP$ or not,
+we leave to others to speculate about.}
+
+One NP problem is the \emph{graph isomorphism problem}. It
+essentially determines whether the following two graphs, say
+$G_1$ and $G_2$, can be moved and stretched so that they look
+exactly the same.
+
+\begin{center}
+\begin{tabular}{c@{\hspace{8mm}}c@{\hspace{8mm}}c}
+$G_1$ & $G_2$\\
+\raisebox{-13mm}{\includegraphics[scale=0.4]{../pics/simple.png}} &
+\raisebox{-13mm}{\includegraphics[scale=0.4]{../pics/simple-b.png}}&
+
+\footnotesize
+\begin{tabular}{rl}
+Graph $G_1$ & Graph $G_2$\\
+a & 1\\
+b & 6\\
+c & 8\\
+d & 3\\
+g & 5\\
+h & 2\\
+i & 4\\
+j & 7\\
+\end{tabular}
+\end{tabular}
+\end{center}
+
+\noindent The table on the right gives a mapping of the nodes
+of the first graph to the nodes of the second. With this
+mapping we can check: node $a$ is connected in $G_1$ with $g$,
+$h$ and $i$. Node $a$ maps to node $1$ in $G_2$, which is
+connected to $2$, $4$ and $5$, which again correspond via the
+mapping to $h$, $i$ and $g$ respectively. Let us write
+$\sigma$ for such a table and let us write
+
+\[G_1 = \sigma(G_2)\]
+
+\noindent for two isomorphic graphs ($\sigma$ being the
+isomorphism). It is actually very easy to construct two
+isomorphic graphs: Start with an arbitrary graph, re-label the
+nodes consistently. Alice will need to do this frequently
+for the protocol below. In order to be useful, we therefore
+would need to consider substantially larger graphs, like
+with thousand nodes.
+
+Now the secret which Alice tries to hide is the knowledge of
+such an isomorphism $\sigma$ between two such graphs. But she
+can convince Bob whether she knows such an isomorphism for two
+graphs, say $G_1$ and $G_2$, using the same idea as in the
+example with Alibaba's cave. For this Alice and Bob must
+follow the following protocol:
+
+\begin{enumerate}
+\item Alice generates an isomorphic graph $H$ which she sends
+ to Bob (in each iteration she needs to generate a
+ different $H$).
+\item
+ After receiving $H$, Bob asks Alice either for an
+ isomorphism between $G_1$ and $H$, or $G_2$ and $H$.
+\item Alice and Bob repeat this procedure $n$ times.
+\end{enumerate}
+
+\noindent In Step 1 it is important that Alice always
+generates a fresh isomorphic graph. I let you think what
+would happen if Alice sends out twice the same graph $H$.
+
+As said before, this is relatively easy to generate by
+consistently relabelling nodes. If she started from $G_1$,
+Alice will have generated
+
+\begin{equation}
+H = \sigma'(G_1)\label{hiso}
+\end{equation}
+
+\noindent where $\sigma'$ is the isomorphism between $H$ and
+$G_1$. But she could equally have started from $G_2$. In the
+case where $G_1$ and $G_2$ are isomorphic, if $H$ is
+isomorphic with $G_1$, it will also be isomorphic with
+$G_2$, and vice versa.
+
+After generating $H$, Alice sends it to Bob. The important
+point is that she needs to ``commit'' to this $H$, therefore
+this kind of zero-knowledge protocols are called
+\emph{commitment protocols}. Only after receiving $H$, Bob
+will make up his mind about which isomorphism he asks
+for---whether between $H$ and $G_1$ or $H$ and $G_2$. For this
+he could flip a coin, since the choice should be as
+unpredictable for Alice as possible. Once Alice receives the
+request, she has to produce an isomorphism. If she generated
+$H$ as shown in \eqref{hiso} and is asked for an isomorphism
+between $H$ and $G_1$, she just sends $\sigma'$. If she had
+been asked for an isomorphism between $H$ and $G_2$, she just
+has to compose her secret isomorphism $\sigma$ and $\sigma'$.
+The main point for the protocol is that even knowing the
+isomorphism between $H$ and $G_1$ or $H$ and $G_2$, will not
+make the task easier to find the isomorphism between $G_1$ and
+$G_2$, which is the secret Alice tries to protect.
+
+In order to make it crystal clear how this protocol proceeds,
+let us give a version using our more formal notation for
+protocols:
+
+\begin{center}
+\begin{tabular}{lrl}
+0) & $A \to B:$ & $G_1$ and $G_2$\\
+1a) & $A \to B:$ & $H_1$ \\
+1b) & $B \to A:$ & produce isomorphism $G_1 \leftrightarrow H_1$?
+ \quad(or $G_2 \leftrightarrow H_1$)\\
+1c) & $A \to B:$ & requested isomorphism\\
+2a) & $A \to B:$ & $H_2$\\
+2b) & $B \to A:$ & produce isomorphism $G_1 \leftrightarrow H_2$?
+ \quad(or $G_2 \leftrightarrow H_2$)\\
+2c) & $A \to B:$ & requested isomorphism\\
+ & \ldots\\
+\end{tabular}
+\end{center}
+
+\noindent As can be seen the protocol runs for some
+agreed number of iterations. The $H_i$ Alice needs to
+produce, need to be all distinct. I hope you now know
+why?
+
+It is also crucial that in each iteration, Alice first sends
+$H_i$ and then Bob can decide which isomorphism he wants:
+either $G_1 \leftrightarrow H_i$ or $G_2 \leftrightarrow H_i$.
+If somehow Alice can find out before she committed to $H_i$,
+she can cheat. For this assume Alice does \emph{not} know an
+isomorphism between $G_1$ and $G_2$. If she knows which
+isomorphism Bob will ask for she can craft $H$ in such a way
+that it is isomorphism with either $G_1$ or $G_2$ (but it
+cannot with both). Then in each case she would send Bob
+a correct answer and he would come to the conclusion that
+all is well. I let you also answer the question whether
+such a protocol run between Alice and Bob would convince
+a third person, say Pete.
+
+Since the interactive nature of the above PKZ protocol and the
+correct ordering of the messages is so important for the
+``correctness'' of the protocol, it might look surprising that
+the same goal can also me achieved in a completely offline
+manner. By this I mean Alice can publish all data at once, and
+then at a later time, Bob can inspect the data and come to the
+conclusion whether or not Alice knows the secret (again
+without actually learning about the secret). For this
+Alice has to do the following:
+
+\begin{enumerate}
+\item Alice generates $n$ isomorphic graphs
+ $H_{1..n}$ (they need to be all distinct)
+\item she feeds the $H_{1..n}$ into a hashing function
+ (for example encoded as as matrix)
+\item she takes the first $n$ bits of the output of the hashing
+ function:
+ whenever the output is $0$, she shows an isomorphism
+ with $G_1$; for $1$ she shows an isomorphism
+ with $G_2$
+\end{enumerate}
+
+\noindent The reason why this works and achieves the same
+goal as the interactive variant is that Alice has no
+control over the hashing function. It would be
+computationally just too hard to assemble a set of
+$H_{1..n}$ such that she can force where $0$s and $1$s
+in the hash values are such that it would pass an external
+test. The point is that Alice can publish all this data
+on the comfort of her own web-page, for example, and
+in this way convince everybody who bothers to check.
+
+The virtue of the use of graphs and isomorphism for a
+zero-knowledge protocol is that the idea why it works
+are relatively straightforward. The disadvantage is
+that encoding any secret into a graph-isomorphism, while
+possible, is awkward. The good news is that in fact
+any NP problem can be used as part of a ZKP protocol.
+
+
+\subsubsection*{Using Modular Logarithms for ZKP Protocols}
+
+While information can be encoded into graph isomorphisms, it
+is not the most convenient carrier of information. Clearly it
+is much easier to encode information into numbers. Let us look
+at zero-knowledge proofs that use numbers as secrets. For this
+the underlying NP-problem is to calculate discrete logarithms.
+It can be used by choosing public numbers $A$, $B$, $p$, and
+private $x$ such that
+
+\begin{center}
+$A^x \equiv B\; mod\; p$
+\end{center}
+
+\noindent holds. The secret Alice tries to keep secret is $x$.
+The point of the modular logarithm is that it is very hard
+from the public data to calculate $x$ (for large primes).
+Now the protocol proceeds in three stages:
+
+\begin{itemize}
+\item {\bf Commitment Stage}
+ \begin{enumerate}
+ \item Alice generates $z$ random numbers $r_1, \ldots, r_z$,
+ all less than $p - 1$. Alice then sends Bob for all
+ $i = 1,\ldots, z$:
+ \[ h_i = A^{r_i}\; mod\; p\]
+ \item Bob generates $z$ random bits, say $b_1,\ldots, b_z$. He can do this
+ by flipping $z$ times a coin, for example.
+ \item For each bit $b_i$, Alice sends Bob an $s_i$ where
+
+ \begin{center}
+ \begin{tabular}{ll}
+ if $b_i = 0$: & $s_i = r_i$\\
+ if $b_i = 1$: & $s_i = (r_i - r_j) \;mod\; (p -1)$\\
+ \end{tabular}
+ \end{center}
+
+ where $r_j$ is the lowest $j$ where $b_j = 1$.
+
+ \end{enumerate}
+ \end{itemize}
+
+\noindent For understanding the last step, let $z$ be just 4.
+We have four random values $r_i$ chosen by Alice and four
+random bits $b_i$ chosen subsequently by Bob, for example
+
+ \begin{center}
+ \begin{tabular}{lcccc}
+ $r_i$:\; & 4 & 9 & 1 & 3\\
+ $b_i$:\; & 0 & 1 & 0 & 1\\
+ & & $\uparrow$ \\
+ & & $j$
+ \end{tabular}
+ \end{center}
+
+ \noindent The highlighted column is the lowest where $b_i =
+ 1$ (counted from the left). That means $r_j = 9$. The reason
+ for letting Alice choose the random numbers $r_1, \ldots, r_z$
+ will become clear shortly. Next is the confirmation
+ phase where Bob essentially checks whether Alice has sent
+ him ``correct'' $s_i$ and $h_i$.
+
+\begin{itemize}
+\item {\bf Confirmation Stage}
+ \begin{enumerate}
+ \item For each $b_i$ Bob checks whether $s_i$
+ conform to the protocol
+
+ \begin{center}
+ \begin{tabular}{ll}
+ if $b_i = 0$: & $A^{s_i} \stackrel{?}{\equiv} h_i\;mod\;p$\\
+ if $b_i = 1$: & $A^{s_i} \stackrel{?}{\equiv} h_i * h_j^{-1} \;mod\; p$\\
+ \end{tabular}
+ \end{center}
+ \end{enumerate}
+\end{itemize}
+
+\noindent To understand the case for $b_i = 1$, you have
+to do the following calculation:
+
+\begin{center}
+\begin{tabular}{r@{\hspace{1mm}}c@{\hspace{1mm}}l}
+$A^{s_i}$ & $=$ & $A^{r_i - r_j}$\\
+ & $=$ & $A^{r_i} * A^{-r_j}$\\
+ & $=$ & $h_{r_i} * h_{r_j}^{-1}\;mod\;p$
+\end{tabular}
+\end{center}
+
+\noindent What is interesting that so far nothing has been
+sent about $x$, which is the secret Alice has. Also notice
+that Bob does not know $r_j$. He received
+
+\begin{center}
+$r_j - r_j$, $r_m - r_j$, \ldots, $r_p - r_j \;mod \;p - 1$
+\end{center}
+
+\noindent whenever his corresponding bits were $1$. So Bob
+does not know $r_j$ and also does not know any $r_i$ where the
+bit was $1$. Information about the $x$ is sent in the next
+stage (obviously not revealing $x$).
+
+\begin{itemize}
+\item {\bf Proving Stage}
+
+\begin{enumerate}
+\item Alice proves she knows $x$, the discrete log of $B$,
+by sending
+
+\begin{center}
+$s_{z+1} = x - r_j\;mod\;p-1$
+\end{center}
+
+\item Bob confirms
+
+\begin{center}
+$A^{s_{z+1}} \stackrel{?}{\equiv} B * h_j^{-1} \;mod \; p$
+\end{center}
+\end{enumerate}
+\end{itemize}
+
+\noindent To understand the last step, you have to do the trick
+again that
+
+\[A^{s_{z+1}} = A^{x-r_j} = \ldots
+\]
+
+\noindent
+which I leave to you.
+
+Now the question is how can Alice cheat? In order to cheat she
+has to coordinate what she sends as $h_i$ in step 1 and $s_i$
+in step 3 of the commitment stage, and also what to send as
+$s_{z+1}$ in the proving stage. For the latter of course
+Alice does not know $x$, so she just chooses some random
+number for $s_{z+1}$ and calculates
+
+\[A^{s_{z+1}}\]
+
+\noindent
+and then solves the equation
+
+\[A^{s_{z+1}} \equiv B * y \;mod\;p\]
+
+\noindent for $y$. This is easy since no logarithm needs to be
+computed. If Alice can guess the $j$ where the first 1 will
+appear in Bob's bit vector, then she sends the inverse of $y$
+as $h_j$ and 0 as $s_j$. However, notice that when she
+calculates a solution for $y$ she does not know $r_j$. For this she
+would need to calculate the modular logarithm
+
+\[y \equiv A^{r_j}\;mod\;p\]
+
+\noindent which is hard (see step 1 in the commitment stage).
+
+Having settled on what $h_j$ should be, now what should Alice
+send as the other $h_i$ and other $s_i$? If the $b_i$ happens
+to be a 1, then the $h_i$ and other $s_i$ need to satisfy the
+test
+
+\[A^{s_i} \stackrel{?}{\equiv} h_i * h_j^{-1} \;mod\; p\]
+
+\noindent where she has already settled on the value of
+$h_j^{-1}$. Lets say she choses $s_i$ at random, then she just
+needs to solve
+
+\[A^{s_i} \equiv z * h_j^{-1} \;mod\; p\]
+
+\noindent for $z$. Again that is easy, but it does not allow
+us to know $r_i$, because then we would again need to solve
+a modular logarithm problem. Let us call an $h_i$ which was
+solved the easy way as \emph{bogus}. Alice has to produce
+bogus $h_i$ for all bits that are going to be $1$ in advance!
+This means she has to guess all the bits correctly. (Yes?
+I let you think about this.)
+
+Let us see what happens if she guesses wrongly: Suppose the
+bit $b_i = 1$ where she thought she will get a 0. Then she has
+already sent an $h_i$ and $h_j$ and now must find an $s_i$
+such that
+
+\[A^{s_i} \equiv h_i * h_j^{-1} \;mod\; p\]
+
+\noindent holds. For this remember in calculating $h_i$, she
+just chose a random $s_i$. Now she has to send a genuine one.
+But this is of course too hard. If she knew the genuine $r_i$
+and $r_j$ for $h_i$ and $h_j$, it would be easy (in this case
+$s_i = r_i - r_j$). But she does not. So she will be found
+out. If $b_i = 0$, but she thought she will get a 1, then
+she has to send a $s_i$ which satisfies
+
+\[A^{s_i} \equiv h_i\;mod\;p\]
+
+\noindent Again she does not know $r_i$. So it is a too hard
+task and she will be found out again.
+
+To sum up, in order for Alice to successfully cheat Bob, she
+needs to guess \emph{all} bits correctly. She has only a
+$\frac{1}{2^z}$ chance of doing this.
+
+\subsubsection*{Further Reading}
+
+Make sure you understand what NP problems
+are.\footnote{\url{http://en.wikipedia.org/wiki/NP_(complexity)}}
+They are the building blocks for zero-knowledge proofs.
+Zero-Knowldege proofs are not yet widely used in production
+systems, but it is slowly gaining ground. One area of application
+where they pop up is crypto currencies (for example Zerocoins
+or how to make sure a Bitcoin exchange is solvent without
+revealing its assets).
+
+If you want to brush up on the modular logarithm problem,
+the Khan Academy has a nice video:
+
+\begin{center}
+\url{https://www.khanacademy.org/video/discrete-logarithm-problem}
+\end{center}
+
+\end{document}
+
+http://blog.cryptographyengineering.com/2014/11/zero-knowledge-proofs-illustrated-primer.html
+
+http://btravers.weebly.com/uploads/6/7/2/9/6729909/zero_knowledge_technique.pdf
+http://zk-ssh.cms.ac/docs/Zero_Knowledge_Prinzipien.pdf
+http://www.wisdom.weizmann.ac.il/~oded/PS/zk-tut02v4.ps
+
+socialist millionares problem
+http://en.wikipedia.org/wiki/Socialist_millionaire
+http://twistedoakstudios.com/blog/Post3724_explain-it-like-im-five-the-socialist-millionaire-problem-and-secure-multi-party-computation
+
+
+%%% Local Variables:
+%%% mode: latex
+%%% TeX-master: t
+%%% End:
Binary file hws/hw01.pdf has changed
Binary file hws/hw02.pdf has changed
Binary file hws/hw03.pdf has changed
Binary file hws/hw04.pdf has changed
Binary file hws/hw05.pdf has changed
Binary file hws/hw06.pdf has changed
--- a/hws/hw06.tex Sat Sep 23 19:39:53 2017 +0100
+++ b/hws/hw06.tex Sat Sep 23 19:52:27 2017 +0100
@@ -1,21 +1,65 @@
\documentclass{article}
\usepackage{../style}
-
\begin{document}
\section*{Homework 6}
-\HEADER
+\begin{enumerate}
+\item What are good uses of anonymity services like Tor?
+
+\item What is meant by the notion \emph{forward privacy}?
+
+\item What is a \emph{re-identification attack}?
-\begin{enumerate}
+\item Imagine you have a completely `innocent' email message,
+ like birthday wishes to your grandmother. Why should you
+ still encrypt this message and your grandmother take the
+ effort to decrypt it?
+ (Hint: The answer has nothing to do with preserving the
+ privacy of your grandmother and nothing to do with
+ keeping her birthday wishes supersecret. Also nothing to
+ do with you and grandmother testing the latest
+ encryption technology, nor just for the sake of it.)
-
+\item One part of achieving privacy (but not the only one) is to
+ properly encrypt your conversations on the Internet. But this is
+ fiercely resisted by some spy agencies. These agencies (and some
+ politicians for that matter) argue that, for example, ISIL's
+ recruiters broadcast messages on, say, Twitter, and get people to
+ follow them. Then they move potential recruits to Twitter Direct
+ Messaging to evaluate if they are a legitimate recruit. If yes, they
+ move them to an encrypted mobile-messaging app. The spy agencies
+ argue that although they can follow the conversations on Twitter,
+ they ``go dark'' on the encrypted message app. To counter this
+ ``going-dark problem'', the spy agencies push for the implementation
+ of back-doors in iMessage and Facebook and Skype and everything else
+ UK or US-made, which they can use eavesdrop on conversations without
+ the conversants' knowledge or consent.\medskip
+
+ What is the fallacy in the spy agencies going-dark argument?
+ (Hint: Think what would happen if the spy agencies and certain
+ politicians get their wish.)
+
+\item DNA data is very sensitive and can easily violate the privacy of
+ (living) people. To get around this, two scientists from Denmark
+ proposed to create a \emph{necrogenomic database} which would record
+ the DNA data of all Danish citizens and residents at the time of
+ their \emph{death}. By matching these to information about illnesses
+ and ailments in life, helpful evidence could be gathered about the
+ genetic origins of diseases. The idea is that the privacy of dead
+ people cannot be violated.
-\item \POSTSCRIPT
-\end{enumerate}
+ What is the fallacy behind this reasoning?
+\item A few years ago a Google executive tried to allay worries about
+ Google pooring over all your emails on Gmail. He said something
+ along the lines: you are watched by an algorithm; this is like being
+ naked in front of your dog. What is wrong with this argument?
+
+\item \POSTSCRIPT
+\end{enumerate}
\end{document}
%%% Local Variables:
Binary file hws/hw07.pdf has changed
--- a/hws/hw07.tex Sat Sep 23 19:39:53 2017 +0100
+++ b/hws/hw07.tex Sat Sep 23 19:52:27 2017 +0100
@@ -3,63 +3,37 @@
\begin{document}
-\section*{Homework 6}
+
+% For Alice to cheat, she has to get her transaction into the blockchain.
+% For this she has to solve proof-of-work puzzles faster than anybody
+% else. Is it possible for her to precompute several blocks that would
+% validate a fraudulent transaction by her? Give a short explanation
+% for your reasoning.
+
+\section*{Homework 7}
+
+\HEADER
\begin{enumerate}
-\item What are good uses of anonymity services like Tor?
-
-\item What is meant by the notion \emph{forward privacy}?
-
-\item What is a \emph{re-identification attack}?
+\item How can the hardness of the proof-of-work puzzles in
+ Bitcoins be adjusted? What is parameter that determines
+ how the hardness is adjusted?
-\item Imagine you have a completely `innocent' email message,
- like birthday wishes to your grandmother. Why should you
- still encrypt this message and your grandmother take the
- effort to decrypt it?
-
- (Hint: The answer has nothing to do with preserving the
- privacy of your grandmother and nothing to do with
- keeping her birthday wishes supersecret. Also nothing to
- do with you and grandmother testing the latest
- encryption technology, nor just for the sake of it.)
+\item What is the main data that is stored in Bitcoin's
+ blockchain?
+
+\item What is is the purpose of the proof-of-work puzzle in
+ Bitcoins?
-\item One part of achieving privacy (but not the only one) is to
- properly encrypt your conversations on the Internet. But this is
- fiercely resisted by some spy agencies. These agencies (and some
- politicians for that matter) argue that, for example, ISIL's
- recruiters broadcast messages on, say, Twitter, and get people to
- follow them. Then they move potential recruits to Twitter Direct
- Messaging to evaluate if they are a legitimate recruit. If yes, they
- move them to an encrypted mobile-messaging app. The spy agencies
- argue that although they can follow the conversations on Twitter,
- they ``go dark'' on the encrypted message app. To counter this
- ``going-dark problem'', the spy agencies push for the implementation
- of back-doors in iMessage and Facebook and Skype and everything else
- UK or US-made, which they can use eavesdrop on conversations without
- the conversants' knowledge or consent.\medskip
-
- What is the fallacy in the spy agencies going-dark argument?
- (Hint: Think what would happen if the spy agencies and certain
- politicians get their wish.)
-
-\item DNA data is very sensitive and can easily violate the privacy of
- (living) people. To get around this, two scientists from Denmark
- proposed to create a \emph{necrogenomic database} which would record
- the DNA data of all Danish citizens and residents at the time of
- their \emph{death}. By matching these to information about illnesses
- and ailments in life, helpful evidence could be gathered about the
- genetic origins of diseases. The idea is that the privacy of dead
- people cannot be violated.
+\item The department has large labs full of computers that are
+ pretty much idle over night. Why is it a bad idea to let
+ them mine for Bitcoins?
- What is the fallacy behind this reasoning?
+\item Is it possible that Bitcoins can get lost (be
+ irretrievable)?
-\item A few years ago a Google executive tried to allay worries about
- Google pooring over all your emails on Gmail. He said something
- along the lines: you are watched by an algorithm; this is like being
- naked in front of your dog. What is wrong with this argument?
-
-\item \POSTSCRIPT
-\end{enumerate}
+\item \POSTSCRIPT
+\end{enumerate}
\end{document}
%%% Local Variables:
Binary file hws/hw08.pdf has changed
--- a/hws/hw08.tex Sat Sep 23 19:39:53 2017 +0100
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,42 +0,0 @@
-\documentclass{article}
-\usepackage{../style}
-
-\begin{document}
-
-
-% For Alice to cheat, she has to get her transaction into the blockchain.
-% For this she has to solve proof-of-work puzzles faster than anybody
-% else. Is it possible for her to precompute several blocks that would
-% validate a fraudulent transaction by her? Give a short explanation
-% for your reasoning.
-
-\section*{Homework 7}
-
-\HEADER
-
-\begin{enumerate}
-\item How can the hardness of the proof-of-work puzzles in
- Bitcoins be adjusted? What is parameter that determines
- how the hardness is adjusted?
-
-\item What is the main data that is stored in Bitcoin's
- blockchain?
-
-\item What is is the purpose of the proof-of-work puzzle in
- Bitcoins?
-
-\item The department has large labs full of computers that are
- pretty much idle over night. Why is it a bad idea to let
- them mine for Bitcoins?
-
-\item Is it possible that Bitcoins can get lost (be
- irretrievable)?
-
-\item \POSTSCRIPT
-\end{enumerate}
-\end{document}
-
-%%% Local Variables:
-%%% mode: latex
-%%% TeX-master: t
-%%% End: