handouts/ho06.tex
author Christian Urban <christian dot urban at kcl dot ac dot uk>
Mon, 05 Oct 2015 12:47:04 +0100
changeset 393 cb308583d86c
parent 370 ddac52c0014c
child 419 667a39dda86e
permissions -rw-r--r--
updated

\documentclass{article}
\usepackage{../style}

\begin{document}
\fnote{\copyright{} Christian Urban, 2014}

\section*{Handout 6 (Zero-Knowledge Proofs)}

Zero-knowledge proofs (short ZKP) solve a paradoxical puzzle:
How to convince somebody else that one knows a secret, without
revealing what the secret actually is? This sounds like a
problem the Mad Hatter from Alice in Wonderland would occupy
himself with, but actually there some serious and not so
serious applications of it. For example, if you solve
crosswords with your friend, say Bob, you might want to
convince him that you found a solution for one question, but
of course you do not want to reveal the solution, as this
might give Bob an advantage somewhere else in the crossword.

So how to convince Bob that you know the answer (or a secret)?
One way would be to come up with the following protocol:
Suppose the answer is \textit{folio}. Then look up the
definition of \textit{folio} in a dictionary. Say you find:

\begin{quote}
``an \textit{individual} leaf of paper or parchment, either
loose as one of a series or forming part of a bound volume,
which is numbered on the recto or front side only.''
\end{quote}

\noindent
Take the first non-article word in this definition,
in this case \textit{individual}, and look up the definition
of this word, say

\begin{quote}
``a single \textit{human} being as distinct from a group''
\end{quote}

\noindent 
In this definition take the second non-article word, that
is \textit{human}, and again look up the definition of this 
word. This will yield

\begin{quote}
``relating to \textit{or} characteristic of humankind''
\end{quote}

\noindent 
You could go on looking up the definition of the third
non-article in this definition and so on. But let us assume
you agreed with Bob to stop after three iterations with the
third non-article word in the last definition, that is
\textit{or}. Now, instead of sending to Bob the solution
\textit{folio}, you send to him \textit{or}. 

How can Bob verify that you know the solution? Well, once he
solved it himself, he can use the dictionary and follow the
same ``trail'' as you did. If the final word agrees with what
you had sent him, he must infer you knew the solution earlier than
him. This protocol works like a one-way hash function because
\textit{or} does not give any hint as to what was the first
word was. I leave you to think why this protocol avoids
articles? 

After Bob found his solution and verified that according to
the protocol it ``maps'' also to \textit{or}, can he be
entirely sure no cheating is going on? Not with 100\%
certainty. It could have been possible that he was
given \textit{or} as the word, but it derived from a
different word. This might seem very unlikely, but at
least theoretical it is a possibility. Protocols based on
zero-knowledge proofs will produce a similar result---they
give an answer that might be erroneous in a very small number
of cases. The point is to iterate them long enough so that
the theoretical possibility of cheating is negligibly small. 

By the way, the authors of the paper ``Dismantling Megamos
Crypto: Wirelessly Lockpicking a Vehicle Immobilizer'' who
were barred from publishing their results used also a hash to
prove they did the work and (presumably) managed to get into
cars without a key; see Figure~\ref{paper}. This is very
similar to the method above about crosswords: They like to
prove that they did the work, but not giving out the
``solution''. But this also shows what the problem with such a
method is: yes, we can hide the secret temporarily, but if
somebody else wants to verify it, then the secret has to be
made public. Bob needs to know that \textit{folio} is the
solution before he can verify the claim of Alice that she had
the solution first. Similarly with the car-crypto paper: we
need to wait until the authors are finally allowed to publish
their findings in order to verify the hash. This might happen
at some point, but equally it might never happen (what for
example happens if the authors lose their copy of the paper
because of a disk failure?). Zero-knowledge proofs, in
contrast, can be immediately checked, even if the secret is
not public yet and perhaps never will be.

\begin{figure}
\begin{center}
\addtolength{\fboxsep}{4mm}
\fbox{\includegraphics[scale=0.4]{../pics/Dismantling_Megamos_Crypto.png}}
\end{center}
\caption{The authors of this paper used a hash in order to prove
later that they have managed to break into cars.\label{paper}}
\end{figure}


\subsubsection*{ZKP: An Illustrative Example}

The idea behind zero-knowledge proofs is not very obvious and
will surely take some time for you to digest. Therefore let us
start with a simple illustrative example. The example will not
be perfect, but hopefully explain the gist of the idea. The
example is called Alibaba's cave, which graphically looks as 
follows:\footnote{The example is taken from an 
article titled ``How to Explain Zero-Knowledge Protocols
to Your Children'' available from 
\url{http://pages.cs.wisc.edu/~mkowalcz/628.pdf}.}

\begin{center}
\begin{tabular}{c@{\hspace{8mm}}c@{\hspace{8mm}}c}
\includegraphics[scale=0.1]{../pics/alibaba1.png} &
\includegraphics[scale=0.1]{../pics/alibaba2.png} &
\includegraphics[scale=0.1]{../pics/alibaba3.png} \\
Step 1 & Step 2 & Step 3
\end{tabular}
\end{center}

\noindent Let us take a closer look at the picture in Step 1:
The cave has a tunnel which forks at some point. Both forks
are connected in a loop. At the deep end of the loop is a
magic wand. The point of the magic wand is that Alice knows
the secret word for how to open it. She wants to keep the word
secret, but wants to convince Bob that she knows it.

One way of course would be to let Bob follow her, but then he
would also find out the secret. So this does not work. To find
a solution, let us first fix the rules: At the beginning Alice
and Bob are outside the cave. Alice goes in alone and takes
either tunnel labelled $A$ in the picture, or the other tunnel
labelled $B$. She waits at the magic wand for the instructions
from Bob, who also goes into the gave and observes what
happens at the fork. He has no knowledge which tunnel Alice
took and calls out (in Step 2) that she should emerge from tunnel
$A$, for example. If she knows the secret for opening the
wand, this will not be a problem for Alice. If she was already
in the $A$-segment of the tunnel, then she just comes back. If
she was in the $B$-segment of the tunnel she will say the magic
word and goes through the wand to emerge from $A$ as requested
by Bob.

Let us have a look at the case where Alice cheats, that is not
knows the secret. She would still go into the cave and use,
for example the $B$-segment of the tunnel. If now Bob says she
should emerge from $B$, she is lucky. But if he says she
should emerge from $A$ then Alice is in trouble: Bob will find
out she does not actually know the secret. So in order to fool
Bob she needs to anticipate his call, and already go into the
corresponding tunnel. This of course also does not work.
Consequently in order to find out whether Alice cheats, Bob
just needs to repeat this protocol many times. Each time Alice
has a chance of $\frac{1}{2}$ to be lucky or being found out.
Iterating this $n$ times means she must be right every time
and when cheating the probability for this is $\frac{1}{2}^n$.


There are some interesting observations we can make about 
Alibaba's cave and the ZKP protocol between Alice and Bob:

\begin{itemize}
\item {\bf Completeness} If Alice knows the secret, Bob
      accepts Alice ``proof'' for sure. There is no error
      possible in that Bob thinks Alice cheats when she
      actually knows the secret. 
\item {\bf Soundness} If Alice does not know the secret,
      Bob accepts her ``proof'' with a very small probability.
      If, as in the example above, the probability of being
      able to hide cheating is $\frac{1}{2}$ in each round 
      it will be $\frac{1}{2}^n$ after $n$-rounds, which even
      for small $n$ say $> 10$ is very small indeed.
\item {\bf Zero-Knowledge} Even if Bob accepts
      the proof by Alice, he cannot convince anybody else.
\end{itemize} 

\noindent The last property is the most interesting one.
Assume Alice has convinced Bob that she knows the secret and
Bob filmed the whole protocol with a camera. Can he use the
video to convince anybody else? After a moment of thought, you
will agree that this is not the case. Alice and Bob might have
just made it all up and colluded by Bob telling Alice
beforehand which tunnel he will call. In this way it appears
as if all iterations of the protocol were successful, but they
prove nothing. If another person wants to find out whether
Alice knows the secret, he or she would have to conduct the
protocol again. This is actually the formal definition of a
zero-knowledge proof: an independent observer cannot
distinguish between a real protocol (where Alice knows the
secret) and a fake one (where Bob and Alice colluded).


\subsubsection*{Using an Graph-Isomorphism Problem for ZKPs}

Now the question is how can we translate Alibaba's cave into a
computer science solution? It turns out we need an NP problem
for that. The main feature of an NP problem is that it is
computational very hard to generate a solution, but it is very
easy to check whether a given solution indeed solves the
problem at hand.\footnote{The question whether $P = NP$ or not,
we leave to others to speculate about.} 

One NP problem is the \emph{graph isomorphism problem}. It
essentially determines whether the following two graphs, say
$G_1$ and $G_2$, can be moved and stretched so that they look
exactly the same.

\begin{center}
\begin{tabular}{c@{\hspace{8mm}}c@{\hspace{8mm}}c}
$G_1$ & $G_2$\\
\raisebox{-13mm}{\includegraphics[scale=0.4]{../pics/simple.png}} &
\raisebox{-13mm}{\includegraphics[scale=0.4]{../pics/simple-b.png}}&

\footnotesize
\begin{tabular}{rl}
Graph $G_1$	& Graph $G_2$\\
a	& 1\\
b	& 6\\
c	& 8\\
d	& 3\\
g	& 5\\
h	& 2\\
i  &  4\\
j  & 7\\
\end{tabular}
\end{tabular}
\end{center}

\noindent The table on the right gives a mapping of the nodes
of the first graph to the nodes of the second. With this
mapping we can check: node $a$ is connected in $G_1$ with $g$,
$h$ and $i$. Node $a$ maps to node $1$ in $G_2$, which is
connected to $2$, $4$ and $5$, which again correspond via the
mapping to $h$, $i$ and $g$ respectively. Let us write
$\sigma$ for such a table and let us write

\[G_1 = \sigma(G_2)\]

\noindent for two isomorphic graphs ($\sigma$ being the
isomorphism). It is actually very easy to construct two
isomorphic graphs: Start with an arbitrary graph, re-label the
nodes consistently. Alice will need to do this frequently 
for the protocol below. In order to be useful, we therefore
would need to consider substantially larger graphs, like
with thousand nodes. 

Now the secret which Alice tries to hide is the knowledge of
such an isomorphism $\sigma$ between two such graphs. But she
can convince Bob whether she knows such an isomorphism for two
graphs, say $G_1$ and $G_2$, using the same idea as in the
example with Alibaba's cave. For this Alice and Bob must
follow the following protocol:

\begin{enumerate}
\item Alice generates an isomorphic graph $H$ which she
      sends to Bob. 
\item After receiving $H$, Bob asks Alice either for an
      isomorphism between $G_1$ and $H$, or $G_2$ and $H$.
\item Alice and Bob repeat this procedure $n$ times.
\end{enumerate}

\noindent In Step 1 it is important that Alice always 
generates a fresh isomorphic graph. As said before, 
this is relatively easy to generate by consistently
relabelling nodes. If she started from $G_1$, Alice will 
have generated

\begin{equation}
H = \sigma'(G_1)\label{hiso}
\end{equation}
 
\noindent where $\sigma'$ is the isomorphism between $H$ and
$G_1$. But she could equally have started from $G_2$. In the 
case where $G_1$ and $G_2$ are isomorphic, if $H$ is 
isomorphic with $G_1$, it will also be isomorphic with 
$G_2$, and vice versa. 

After generating $H$, Alice sends it to Bob. The important
point is that she needs to ``commit'' to this $H$, therefore
this kind of zero-knowledge protocols are called
\emph{commitment protocols}. Only after receiving $H$, Bob
will make up his mind about which isomorphism he asks
for---whether between $H$ and $G_1$ or $H$ and $G_2$. For this
he could flip a coin, since the choice should be as
unpredictable for Alice as possible. Once Alice receives the
request, she has to produce an isomorphism. If she generated
$H$ as shown in \eqref{hiso} and is asked for an isomorphism
between $H$ and $G_1$, she just sends $\sigma'$. If she had
been asked for an isomorphism between $H$ and $G_2$, she just
has to compose her secret isomorphism $\sigma$ and $\sigma'$.
The main point for the protocol is that even knowing the
isomorphism between $H$ and $G_1$ or $H$ and $G_2$, will not
make the task easier to find the isomorphism between $G_1$ and
$G_2$, which is the secret Alice tries to protect.

In order to make it crystal clear how this protocol proceeds, 
let us give a version using our more formal notation for 
protocols:

\begin{center}
\begin{tabular}{lrl}
0)  & $A \to B:$ & $G_1$ and $G_2$\\
1a) & $A \to B:$ & $H_1$ \\
1b) & $B \to A:$ & produce isomorphism $G_1 \leftrightarrow H_1$? 
 \quad(or $G_2 \leftrightarrow H_1$)\\
1c) & $A \to B:$ & requested isomorphism\\
2a) & $A \to B:$ & $H_2$\\
2b) & $B \to A:$ & produce isomorphism $G_1 \leftrightarrow H_2$? 
 \quad(or $G_2 \leftrightarrow H_2$)\\
2c) & $A \to B:$ & requested isomorphism\\
    & \ldots\\
\end{tabular}
\end{center}

\noindent As can be seen the protocol runs for some
agreed number of iterations. The $H_i$ Alice needs to 
produce, need to be all distinct. I let you think why?

It is also crucial that in each iteration, Alice first sends
$H_i$ and then Bob can decide which isomorphism he wants:
either $G_1 \leftrightarrow H_i$ or $G_2 \leftrightarrow H_i$.
If somehow Alice can find out before she committed to $H_i$,
she can cheat. For this assume Alice does \emph{not} know an
isomorphism between $G_1$ and $G_2$. If she knows which
isomorphism Bob will ask for she can craft $H$ in such a way
that it is isomorphism with either $G_1$ or $G_2$ (but it
cannot with both). Then in each case she would send Bob
a correct answer and he would come to the conclusion that
all is well. I let you also answer the question whether
such a protocol run between Alice and Bob would convince
a third person, say Pete.
 
Since the interactive nature of the above PKZ protocol and the
correct ordering of the messages is so important for the
``correctness'' of the protocol, it might look surprising that
the same goal can also me achieved in a completely offline
manner. By this I mean Alice can publish all data at once, and
then at a later time, Bob can inspect the data and come to the
conclusion whether or not Alice knows the secret (again
without actually learning about the secret). For this
Alice has to do the following:

\begin{enumerate}
\item Alice generates $n$ isomorphic graphs
      $H_{1..n}$ (they need to be all distinct)
\item she feeds the $H_{1..n}$ into a hashing function
      (for example encoded as as matrix)
\item she takes the first $n$ bits of the output:
      whenever the output is $0$, she shows an isomorphism
      with $G_1$; for $1$ she shows an isomorphism
      with $G_2$
\end{enumerate}

\noindent The reason why this works and achieves the same
goal as the interactive variant is that Alice has no
control over the hashing functions. It would be 
computationally just too hard to assemble a set of
$H_{1..n}$ such that she can force where $0$s and $1$s
in the hash values are such that it would pass an external
test. The point is that Alice can publish all this data
on the comfort of her own web-page, for example, and 
in this way convince everybody who bothers to check. 

The virtue of the use of graphs and isomorphism for a 
zero-knowledge protocol is that the idea why it works
are relatively straightforward. The disadvantage is
that encoding any secret into a graph-isomorphism, while
possible, is awkward. The good news is that in fact
any NP problem can be used as part of a ZKP protocol.  


\subsubsection*{Using Modular Arithmetic for ZKP Protocols}

While information can be encoded into graph isomorphisms, it
is not the most convenient carrier of information. Clearly it
is much easier to encode information into numbers. Let us look
at zero-knowledge proofs that use numbers as secrets. For this
the underlying NP-problem is to calculate discrete logarithms.
It can be used by choosing public numbers $A$, $B$, $p$, and
private $x$ such that

\begin{center}
$A^x \equiv B\; mod\; p$
\end{center} 

\noindent holds. The secret Alice tries to keep secret is $x$.
\bigskip

\noindent
\ldots still to be completed (for example can be attacked by
MITM attacks)

\subsubsection*{Further Reading}

Make sure you understand what NP problems
are.\footnote{\url{http://en.wikipedia.org/wiki/NP_(complexity)}} They
are the building blocks for zero-knowledge proofs.

\end{document}

http://blog.cryptographyengineering.com/2014/11/zero-knowledge-proofs-illustrated-primer.html

http://btravers.weebly.com/uploads/6/7/2/9/6729909/zero_knowledge_technique.pdf
http://zk-ssh.cms.ac/docs/Zero_Knowledge_Prinzipien.pdf
http://www.wisdom.weizmann.ac.il/~oded/PS/zk-tut02v4.ps

socialist millionares problem
http://en.wikipedia.org/wiki/Socialist_millionaire
http://twistedoakstudios.com/blog/Post3724_explain-it-like-im-five-the-socialist-millionaire-problem-and-secure-multi-party-computation


%%% Local Variables: 
%%% mode: latex
%%% TeX-master: t
%%% End: