diff -r 98ae49ffc262 -r 62985f147c85 handouts/ho99-zkp.tex --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/handouts/ho99-zkp.tex Tue Sep 26 12:10:41 2017 +0100 @@ -0,0 +1,622 @@ +\documentclass{article} +\usepackage{../style} + +\begin{document} +\fnote{\copyright{} Christian Urban, King's College London, 2014, 2015} + +\section*{Handout 6 (Zero-Knowledge Proofs)} + +Zero-knowledge proofs (short ZKP) solve a paradoxical puzzle: +How to convince somebody else that one knows a secret, without +revealing what the secret actually is? This sounds like a +problem the Mad Hatter from Alice in Wonderland would occupy +himself with, but actually there some serious and not so +serious applications of it. For example, if you solve +crosswords with your friend, say Bob, you might want to +convince him that you found a solution for one question, but +of course you do not want to reveal the solution, as this +might give Bob an advantage somewhere else in the crossword. + +So how to convince Bob that you know the answer (or a secret)? +One way would be to come up with the following protocol: +Suppose the answer is \textit{folio}. Then look up the +definition of \textit{folio} in a dictionary. Say you find: + +\begin{quote} +``an \textit{individual} leaf of paper or parchment, either +loose as one of a series or forming part of a bound volume, +which is numbered on the recto or front side only.'' +\end{quote} + +\noindent +Take the first non-small word\footnote{Let's say the, a, an, +or, and \ldots fall into the category of small words.} in this definition, +in this case \textit{individual}, and look up the definition +of this word, say + +\begin{quote} +``a single \textit{human} being as distinct from a group'' +\end{quote} + +\noindent +In this definition take the second non-small word, that +is \textit{human}, and again look up the definition of this +word. This will yield + +\begin{quote} +``relating to or \textit{characteristic} of humankind'' +\end{quote} + +\noindent +You could go on looking up the definition of the third +non-small word in this definition and so on. But let us assume +you agreed with Bob to stop after three iterations with the +third non-article word in the last definition, that is +\textit{or}. Now, instead of sending to Bob the solution +\textit{folio}, you send to him \textit{characteristic}. + +How can Bob verify that you know the solution? Well, once he +solved it himself, he can use the dictionary and follow the +same ``trail'' as you did. If the final word agrees with what +you had sent him, he must infer you knew the solution earlier +than him. This protocol works like a one-way hash function +because \textit{characteristic} does not give any hint as to +what was the first word was. I leave you to think why this +protocol avoids small words? + +After Bob found his solution and verified that according to +the protocol it ``maps'' also to \textit{characteristic}, can +he be entirely sure no cheating is going on? Not with 100\% +certainty. It could have been possible that he was given +\textit{characteristic} as the word, but it derived from a +different word. This might seem very unlikely, but at least +theoretical it is a possibility. Protocols based on +zero-knowledge proofs will produce a similar result---they +give an answer that might be erroneous in a very small number +of cases. The point is to iterate them long enough so that the +theoretical possibility of cheating is negligibly small. + +By the way, the authors of the paper ``Dismantling Megamos +Crypto: Wirelessly Lockpicking a Vehicle Immobilizer'' who +were barred from publishing their results used also a hash to +prove they did the work and (presumably) managed to get into +cars without a key; see Figure~\ref{paper}. This is very +similar to the method above about crosswords: They like to +prove that they did the work, but not giving out the +``solution''. But this also shows what the problem with such a +method is: yes, we can hide the secret temporarily, but if +somebody else wants to verify it, then the secret has to be +made public. Bob needs to know that \textit{folio} is the +solution before he can verify the claim of Alice that she had +the solution first. Similarly with the car-crypto paper: we +needed to wait until September 2015 when the authors were +finally able to publish their findings in order to verify the +hash. Zero-knowledge proofs, in contrast, can be immediately +checked, even if the secret is not public yet and perhaps +never will be. + +\begin{figure} +\begin{center} +\addtolength{\fboxsep}{4mm} +\fbox{\includegraphics[scale=0.4]{../pics/Dismantling_Megamos_Crypto.png}} +\end{center} +\caption{The authors of this paper used a hash in order to prove +later that they have managed to break into cars.\label{paper}} +\end{figure} + + +\subsubsection*{ZKP: An Illustrative Example} + +The idea behind zero-knowledge proofs is not very obvious and +will surely take some time for you to digest. Therefore let us +start with a simple illustrative example. The example will not +be perfect, but hopefully explain the gist of the idea. The +example is called Alibaba's cave, which graphically looks as +follows:\footnote{The example is taken from an +article titled ``How to Explain Zero-Knowledge Protocols +to Your Children'' available from +\url{http://pages.cs.wisc.edu/~mkowalcz/628.pdf}.} + +\begin{center} +\begin{tabular}{c@{\hspace{8mm}}c@{\hspace{8mm}}c} +\includegraphics[scale=0.1]{../pics/alibaba1.png} & +\includegraphics[scale=0.1]{../pics/alibaba2.png} & +\includegraphics[scale=0.1]{../pics/alibaba3.png} \\ +Step 1 & Step 2 & Step 3 +\end{tabular} +\end{center} + +\noindent Let us take a closer look at the picture in Step 1: +The cave has a tunnel which forks at some point. Both forks +are connected in a loop. At the deep end of the loop is a +magic wand. The point of the magic wand is that Alice knows +the secret word for how to open it. She wants to keep the word +secret, but wants to convince Bob that she knows it. + +One way of course would be to let Bob follow her, but then he +would also find out the secret. So this does not work. To find +a solution, let us first fix the rules: At the beginning Alice +and Bob are outside the cave. Alice goes in alone and takes +either tunnel labelled $A$ in the picture, or the other tunnel +labelled $B$. She waits at the magic wand for the instructions +from Bob, who also goes into the gave and observes what +happens at the fork. He has no knowledge which tunnel Alice +took and calls out (in Step 2) that she should emerge from tunnel +$A$, for example. If she knows the secret for opening the +wand, this will not be a problem for Alice. If she was already +in the $A$-segment of the tunnel, then she just comes back. If +she was in the $B$-segment of the tunnel she will say the magic +word and goes through the wand to emerge from $A$ as requested +by Bob. + +Let us have a look at the case where Alice cheats, that is not +knows the secret. She would still go into the cave and use, +for example the $B$-segment of the tunnel. If now Bob says she +should emerge from $B$, she is lucky. But if he says she +should emerge from $A$ then Alice is in trouble: Bob will find +out she does not actually know the secret. So in order to fool +Bob she needs to anticipate his call, and already go into the +corresponding tunnel. This of course also does not work, since +Bob can make any call he wants. Consequently in order to find +out whether Alice cheats, Bob just needs to repeat this +protocol many times. Each time Alice has a chance of +$\frac{1}{2}$ to be lucky or being found out. Iterating this +$n$ times means she must be right every time and when +cheating: the probability for this is $\frac{1}{2}^n$, number +that for already relatively small $n$, say 10, is incredibly +small. + + +There are some interesting observations we can make about +Alibaba's cave and the ZKP protocol between Alice and Bob: + +\begin{itemize} +\item {\bf Completeness} If Alice knows the secret, Bob + accepts Alice ``proof'' for sure. There is no error + possible in that Bob thinks Alice cheats when she + actually knows the secret. +\item {\bf Soundness} If Alice does not know the secret, + Bob accepts her ``proof'' with a very small probability. + If, as in the example above, the probability of being + able to hide cheating is $\frac{1}{2}$ in each round + it will be $\frac{1}{2}^n$ after $n$-rounds, which even + for small $n$ say $> 10$ is very small indeed. +\item {\bf Zero-Knowledge} Even if Bob accepts + the proof by Alice, he cannot convince anybody else. +\end{itemize} + +\noindent The last property is the most interesting one. +Assume Alice has convinced Bob that she knows the secret and +Bob filmed the whole protocol with a camera. Can he use the +video to convince anybody else? After a moment of thought, you +will agree that this is not the case. Alice and Bob might have +just made it all up and colluded by Bob telling Alice +beforehand which tunnel he will call. In this way it appears +as if all iterations of the protocol were successful, but they +prove nothing. If another person wants to find out whether +Alice knows the secret, he or she would have to conduct the +protocol again. This is actually the formal definition of a +zero-knowledge proof: an independent observer cannot +distinguish between a real protocol (where Alice knows the +secret) and a fake one (where Bob and Alice colluded). + + +\subsubsection*{Using an Graph-Isomorphism Problem for ZKPs} + +Now the question is how can we translate Alibaba's cave into a +computer science solution? It turns out we need an NP problem +for that. The main feature of an NP problem is that it is +computational very hard to generate a solution, but it is very +easy to check whether a given solution indeed solves the +problem at hand.\footnote{The question whether $P = NP$ or not, +we leave to others to speculate about.} + +One NP problem is the \emph{graph isomorphism problem}. It +essentially determines whether the following two graphs, say +$G_1$ and $G_2$, can be moved and stretched so that they look +exactly the same. + +\begin{center} +\begin{tabular}{c@{\hspace{8mm}}c@{\hspace{8mm}}c} +$G_1$ & $G_2$\\ +\raisebox{-13mm}{\includegraphics[scale=0.4]{../pics/simple.png}} & +\raisebox{-13mm}{\includegraphics[scale=0.4]{../pics/simple-b.png}}& + +\footnotesize +\begin{tabular}{rl} +Graph $G_1$ & Graph $G_2$\\ +a & 1\\ +b & 6\\ +c & 8\\ +d & 3\\ +g & 5\\ +h & 2\\ +i & 4\\ +j & 7\\ +\end{tabular} +\end{tabular} +\end{center} + +\noindent The table on the right gives a mapping of the nodes +of the first graph to the nodes of the second. With this +mapping we can check: node $a$ is connected in $G_1$ with $g$, +$h$ and $i$. Node $a$ maps to node $1$ in $G_2$, which is +connected to $2$, $4$ and $5$, which again correspond via the +mapping to $h$, $i$ and $g$ respectively. Let us write +$\sigma$ for such a table and let us write + +\[G_1 = \sigma(G_2)\] + +\noindent for two isomorphic graphs ($\sigma$ being the +isomorphism). It is actually very easy to construct two +isomorphic graphs: Start with an arbitrary graph, re-label the +nodes consistently. Alice will need to do this frequently +for the protocol below. In order to be useful, we therefore +would need to consider substantially larger graphs, like +with thousand nodes. + +Now the secret which Alice tries to hide is the knowledge of +such an isomorphism $\sigma$ between two such graphs. But she +can convince Bob whether she knows such an isomorphism for two +graphs, say $G_1$ and $G_2$, using the same idea as in the +example with Alibaba's cave. For this Alice and Bob must +follow the following protocol: + +\begin{enumerate} +\item Alice generates an isomorphic graph $H$ which she sends + to Bob (in each iteration she needs to generate a + different $H$). +\item + After receiving $H$, Bob asks Alice either for an + isomorphism between $G_1$ and $H$, or $G_2$ and $H$. +\item Alice and Bob repeat this procedure $n$ times. +\end{enumerate} + +\noindent In Step 1 it is important that Alice always +generates a fresh isomorphic graph. I let you think what +would happen if Alice sends out twice the same graph $H$. + +As said before, this is relatively easy to generate by +consistently relabelling nodes. If she started from $G_1$, +Alice will have generated + +\begin{equation} +H = \sigma'(G_1)\label{hiso} +\end{equation} + +\noindent where $\sigma'$ is the isomorphism between $H$ and +$G_1$. But she could equally have started from $G_2$. In the +case where $G_1$ and $G_2$ are isomorphic, if $H$ is +isomorphic with $G_1$, it will also be isomorphic with +$G_2$, and vice versa. + +After generating $H$, Alice sends it to Bob. The important +point is that she needs to ``commit'' to this $H$, therefore +this kind of zero-knowledge protocols are called +\emph{commitment protocols}. Only after receiving $H$, Bob +will make up his mind about which isomorphism he asks +for---whether between $H$ and $G_1$ or $H$ and $G_2$. For this +he could flip a coin, since the choice should be as +unpredictable for Alice as possible. Once Alice receives the +request, she has to produce an isomorphism. If she generated +$H$ as shown in \eqref{hiso} and is asked for an isomorphism +between $H$ and $G_1$, she just sends $\sigma'$. If she had +been asked for an isomorphism between $H$ and $G_2$, she just +has to compose her secret isomorphism $\sigma$ and $\sigma'$. +The main point for the protocol is that even knowing the +isomorphism between $H$ and $G_1$ or $H$ and $G_2$, will not +make the task easier to find the isomorphism between $G_1$ and +$G_2$, which is the secret Alice tries to protect. + +In order to make it crystal clear how this protocol proceeds, +let us give a version using our more formal notation for +protocols: + +\begin{center} +\begin{tabular}{lrl} +0) & $A \to B:$ & $G_1$ and $G_2$\\ +1a) & $A \to B:$ & $H_1$ \\ +1b) & $B \to A:$ & produce isomorphism $G_1 \leftrightarrow H_1$? + \quad(or $G_2 \leftrightarrow H_1$)\\ +1c) & $A \to B:$ & requested isomorphism\\ +2a) & $A \to B:$ & $H_2$\\ +2b) & $B \to A:$ & produce isomorphism $G_1 \leftrightarrow H_2$? + \quad(or $G_2 \leftrightarrow H_2$)\\ +2c) & $A \to B:$ & requested isomorphism\\ + & \ldots\\ +\end{tabular} +\end{center} + +\noindent As can be seen the protocol runs for some +agreed number of iterations. The $H_i$ Alice needs to +produce, need to be all distinct. I hope you now know +why? + +It is also crucial that in each iteration, Alice first sends +$H_i$ and then Bob can decide which isomorphism he wants: +either $G_1 \leftrightarrow H_i$ or $G_2 \leftrightarrow H_i$. +If somehow Alice can find out before she committed to $H_i$, +she can cheat. For this assume Alice does \emph{not} know an +isomorphism between $G_1$ and $G_2$. If she knows which +isomorphism Bob will ask for she can craft $H$ in such a way +that it is isomorphism with either $G_1$ or $G_2$ (but it +cannot with both). Then in each case she would send Bob +a correct answer and he would come to the conclusion that +all is well. I let you also answer the question whether +such a protocol run between Alice and Bob would convince +a third person, say Pete. + +Since the interactive nature of the above PKZ protocol and the +correct ordering of the messages is so important for the +``correctness'' of the protocol, it might look surprising that +the same goal can also me achieved in a completely offline +manner. By this I mean Alice can publish all data at once, and +then at a later time, Bob can inspect the data and come to the +conclusion whether or not Alice knows the secret (again +without actually learning about the secret). For this +Alice has to do the following: + +\begin{enumerate} +\item Alice generates $n$ isomorphic graphs + $H_{1..n}$ (they need to be all distinct) +\item she feeds the $H_{1..n}$ into a hashing function + (for example encoded as as matrix) +\item she takes the first $n$ bits of the output of the hashing + function: + whenever the output is $0$, she shows an isomorphism + with $G_1$; for $1$ she shows an isomorphism + with $G_2$ +\end{enumerate} + +\noindent The reason why this works and achieves the same +goal as the interactive variant is that Alice has no +control over the hashing function. It would be +computationally just too hard to assemble a set of +$H_{1..n}$ such that she can force where $0$s and $1$s +in the hash values are such that it would pass an external +test. The point is that Alice can publish all this data +on the comfort of her own web-page, for example, and +in this way convince everybody who bothers to check. + +The virtue of the use of graphs and isomorphism for a +zero-knowledge protocol is that the idea why it works +are relatively straightforward. The disadvantage is +that encoding any secret into a graph-isomorphism, while +possible, is awkward. The good news is that in fact +any NP problem can be used as part of a ZKP protocol. + + +\subsubsection*{Using Modular Logarithms for ZKP Protocols} + +While information can be encoded into graph isomorphisms, it +is not the most convenient carrier of information. Clearly it +is much easier to encode information into numbers. Let us look +at zero-knowledge proofs that use numbers as secrets. For this +the underlying NP-problem is to calculate discrete logarithms. +It can be used by choosing public numbers $A$, $B$, $p$, and +private $x$ such that + +\begin{center} +$A^x \equiv B\; mod\; p$ +\end{center} + +\noindent holds. The secret Alice tries to keep secret is $x$. +The point of the modular logarithm is that it is very hard +from the public data to calculate $x$ (for large primes). +Now the protocol proceeds in three stages: + +\begin{itemize} +\item {\bf Commitment Stage} + \begin{enumerate} + \item Alice generates $z$ random numbers $r_1, \ldots, r_z$, + all less than $p - 1$. Alice then sends Bob for all + $i = 1,\ldots, z$: + \[ h_i = A^{r_i}\; mod\; p\] + \item Bob generates $z$ random bits, say $b_1,\ldots, b_z$. He can do this + by flipping $z$ times a coin, for example. + \item For each bit $b_i$, Alice sends Bob an $s_i$ where + + \begin{center} + \begin{tabular}{ll} + if $b_i = 0$: & $s_i = r_i$\\ + if $b_i = 1$: & $s_i = (r_i - r_j) \;mod\; (p -1)$\\ + \end{tabular} + \end{center} + + where $r_j$ is the lowest $j$ where $b_j = 1$. + + \end{enumerate} + \end{itemize} + +\noindent For understanding the last step, let $z$ be just 4. +We have four random values $r_i$ chosen by Alice and four +random bits $b_i$ chosen subsequently by Bob, for example + + \begin{center} + \begin{tabular}{lcccc} + $r_i$:\; & 4 & 9 & 1 & 3\\ + $b_i$:\; & 0 & 1 & 0 & 1\\ + & & $\uparrow$ \\ + & & $j$ + \end{tabular} + \end{center} + + \noindent The highlighted column is the lowest where $b_i = + 1$ (counted from the left). That means $r_j = 9$. The reason + for letting Alice choose the random numbers $r_1, \ldots, r_z$ + will become clear shortly. Next is the confirmation + phase where Bob essentially checks whether Alice has sent + him ``correct'' $s_i$ and $h_i$. + +\begin{itemize} +\item {\bf Confirmation Stage} + \begin{enumerate} + \item For each $b_i$ Bob checks whether $s_i$ + conform to the protocol + + \begin{center} + \begin{tabular}{ll} + if $b_i = 0$: & $A^{s_i} \stackrel{?}{\equiv} h_i\;mod\;p$\\ + if $b_i = 1$: & $A^{s_i} \stackrel{?}{\equiv} h_i * h_j^{-1} \;mod\; p$\\ + \end{tabular} + \end{center} + \end{enumerate} +\end{itemize} + +\noindent To understand the case for $b_i = 1$, you have +to do the following calculation: + +\begin{center} +\begin{tabular}{r@{\hspace{1mm}}c@{\hspace{1mm}}l} +$A^{s_i}$ & $=$ & $A^{r_i - r_j}$\\ + & $=$ & $A^{r_i} * A^{-r_j}$\\ + & $=$ & $h_{r_i} * h_{r_j}^{-1}\;mod\;p$ +\end{tabular} +\end{center} + +\noindent What is interesting that so far nothing has been +sent about $x$, which is the secret Alice has. Also notice +that Bob does not know $r_j$. He received + +\begin{center} +$r_j - r_j$, $r_m - r_j$, \ldots, $r_p - r_j \;mod \;p - 1$ +\end{center} + +\noindent whenever his corresponding bits were $1$. So Bob +does not know $r_j$ and also does not know any $r_i$ where the +bit was $1$. Information about the $x$ is sent in the next +stage (obviously not revealing $x$). + +\begin{itemize} +\item {\bf Proving Stage} + +\begin{enumerate} +\item Alice proves she knows $x$, the discrete log of $B$, +by sending + +\begin{center} +$s_{z+1} = x - r_j\;mod\;p-1$ +\end{center} + +\item Bob confirms + +\begin{center} +$A^{s_{z+1}} \stackrel{?}{\equiv} B * h_j^{-1} \;mod \; p$ +\end{center} +\end{enumerate} +\end{itemize} + +\noindent To understand the last step, you have to do the trick +again that + +\[A^{s_{z+1}} = A^{x-r_j} = \ldots +\] + +\noindent +which I leave to you. + +Now the question is how can Alice cheat? In order to cheat she +has to coordinate what she sends as $h_i$ in step 1 and $s_i$ +in step 3 of the commitment stage, and also what to send as +$s_{z+1}$ in the proving stage. For the latter of course +Alice does not know $x$, so she just chooses some random +number for $s_{z+1}$ and calculates + +\[A^{s_{z+1}}\] + +\noindent +and then solves the equation + +\[A^{s_{z+1}} \equiv B * y \;mod\;p\] + +\noindent for $y$. This is easy since no logarithm needs to be +computed. If Alice can guess the $j$ where the first 1 will +appear in Bob's bit vector, then she sends the inverse of $y$ +as $h_j$ and 0 as $s_j$. However, notice that when she +calculates a solution for $y$ she does not know $r_j$. For this she +would need to calculate the modular logarithm + +\[y \equiv A^{r_j}\;mod\;p\] + +\noindent which is hard (see step 1 in the commitment stage). + +Having settled on what $h_j$ should be, now what should Alice +send as the other $h_i$ and other $s_i$? If the $b_i$ happens +to be a 1, then the $h_i$ and other $s_i$ need to satisfy the +test + +\[A^{s_i} \stackrel{?}{\equiv} h_i * h_j^{-1} \;mod\; p\] + +\noindent where she has already settled on the value of +$h_j^{-1}$. Lets say she choses $s_i$ at random, then she just +needs to solve + +\[A^{s_i} \equiv z * h_j^{-1} \;mod\; p\] + +\noindent for $z$. Again that is easy, but it does not allow +us to know $r_i$, because then we would again need to solve +a modular logarithm problem. Let us call an $h_i$ which was +solved the easy way as \emph{bogus}. Alice has to produce +bogus $h_i$ for all bits that are going to be $1$ in advance! +This means she has to guess all the bits correctly. (Yes? +I let you think about this.) + +Let us see what happens if she guesses wrongly: Suppose the +bit $b_i = 1$ where she thought she will get a 0. Then she has +already sent an $h_i$ and $h_j$ and now must find an $s_i$ +such that + +\[A^{s_i} \equiv h_i * h_j^{-1} \;mod\; p\] + +\noindent holds. For this remember in calculating $h_i$, she +just chose a random $s_i$. Now she has to send a genuine one. +But this is of course too hard. If she knew the genuine $r_i$ +and $r_j$ for $h_i$ and $h_j$, it would be easy (in this case +$s_i = r_i - r_j$). But she does not. So she will be found +out. If $b_i = 0$, but she thought she will get a 1, then +she has to send a $s_i$ which satisfies + +\[A^{s_i} \equiv h_i\;mod\;p\] + +\noindent Again she does not know $r_i$. So it is a too hard +task and she will be found out again. + +To sum up, in order for Alice to successfully cheat Bob, she +needs to guess \emph{all} bits correctly. She has only a +$\frac{1}{2^z}$ chance of doing this. + +\subsubsection*{Further Reading} + +Make sure you understand what NP problems +are.\footnote{\url{http://en.wikipedia.org/wiki/NP_(complexity)}} +They are the building blocks for zero-knowledge proofs. +Zero-Knowldege proofs are not yet widely used in production +systems, but it is slowly gaining ground. One area of application +where they pop up is crypto currencies (for example Zerocoins +or how to make sure a Bitcoin exchange is solvent without +revealing its assets). + +If you want to brush up on the modular logarithm problem, +the Khan Academy has a nice video: + +\begin{center} +\url{https://www.khanacademy.org/video/discrete-logarithm-problem} +\end{center} + +\end{document} + +http://blog.cryptographyengineering.com/2014/11/zero-knowledge-proofs-illustrated-primer.html + +http://btravers.weebly.com/uploads/6/7/2/9/6729909/zero_knowledge_technique.pdf +http://zk-ssh.cms.ac/docs/Zero_Knowledge_Prinzipien.pdf +http://www.wisdom.weizmann.ac.il/~oded/PS/zk-tut02v4.ps + +socialist millionares problem +http://en.wikipedia.org/wiki/Socialist_millionaire +http://twistedoakstudios.com/blog/Post3724_explain-it-like-im-five-the-socialist-millionaire-problem-and-secure-multi-party-computation + + +%%% Local Variables: +%%% mode: latex +%%% TeX-master: t +%%% End: