finished ZKP protocols
authorChristian Urban <christian dot urban at kcl dot ac dot uk>
Thu, 05 Nov 2015 02:11:13 +0000
changeset 422 abe178b3197e
parent 421 38ddbc59325a
child 423 11b46fa92a85
finished ZKP protocols
handouts/ho06.pdf
handouts/ho06.tex
slides/slides06.pdf
slides/slides06.tex
Binary file handouts/ho06.pdf has changed
--- a/handouts/ho06.tex	Sun Oct 25 23:34:04 2015 +0000
+++ b/handouts/ho06.tex	Thu Nov 05 02:11:13 2015 +0000
@@ -29,7 +29,8 @@
 \end{quote}
 
 \noindent
-Take the first non-article word in this definition,
+Take the first non-small word\footnote{Let's say the, a, an,
+or, and \ldots fall into the category of small words.} in this definition,
 in this case \textit{individual}, and look up the definition
 of this word, say
 
@@ -38,42 +39,42 @@
 \end{quote}
 
 \noindent 
-In this definition take the second non-article word, that
+In this definition take the second non-small word, that
 is \textit{human}, and again look up the definition of this 
 word. This will yield
 
 \begin{quote}
-``relating to \textit{or} characteristic of humankind''
+``relating to or \textit{characteristic} of humankind''
 \end{quote}
 
 \noindent 
 You could go on looking up the definition of the third
-non-article in this definition and so on. But let us assume
+non-small word in this definition and so on. But let us assume
 you agreed with Bob to stop after three iterations with the
 third non-article word in the last definition, that is
 \textit{or}. Now, instead of sending to Bob the solution
-\textit{folio}, you send to him \textit{or}. 
+\textit{folio}, you send to him \textit{characteristic}. 
 
 How can Bob verify that you know the solution? Well, once he
 solved it himself, he can use the dictionary and follow the
 same ``trail'' as you did. If the final word agrees with what
-you had sent him, he must infer you knew the solution earlier than
-him. This protocol works like a one-way hash function because
-\textit{or} does not give any hint as to what was the first
-word was. I leave you to think why this protocol avoids
-articles? 
+you had sent him, he must infer you knew the solution earlier
+than him. This protocol works like a one-way hash function
+because \textit{characteristic} does not give any hint as to
+what was the first word was. I leave you to think why this
+protocol avoids small words? 
 
 After Bob found his solution and verified that according to
-the protocol it ``maps'' also to \textit{or}, can he be
-entirely sure no cheating is going on? Not with 100\%
-certainty. It could have been possible that he was
-given \textit{or} as the word, but it derived from a
-different word. This might seem very unlikely, but at
-least theoretical it is a possibility. Protocols based on
+the protocol it ``maps'' also to \textit{characteristic}, can
+he be entirely sure no cheating is going on? Not with 100\%
+certainty. It could have been possible that he was given
+\textit{characteristic} as the word, but it derived from a
+different word. This might seem very unlikely, but at least
+theoretical it is a possibility. Protocols based on
 zero-knowledge proofs will produce a similar result---they
 give an answer that might be erroneous in a very small number
-of cases. The point is to iterate them long enough so that
-the theoretical possibility of cheating is negligibly small. 
+of cases. The point is to iterate them long enough so that the
+theoretical possibility of cheating is negligibly small. 
 
 By the way, the authors of the paper ``Dismantling Megamos
 Crypto: Wirelessly Lockpicking a Vehicle Immobilizer'' who
@@ -88,13 +89,11 @@
 made public. Bob needs to know that \textit{folio} is the
 solution before he can verify the claim of Alice that she had
 the solution first. Similarly with the car-crypto paper: we
-need to wait until the authors are finally allowed to publish
-their findings in order to verify the hash. This might happen
-at some point, but equally it might never happen (what for
-example happens if the authors lose their copy of the paper
-because of a disk failure?). Zero-knowledge proofs, in
-contrast, can be immediately checked, even if the secret is
-not public yet and perhaps never will be.
+needed to wait until September 2015 when the authors were
+finally able to publish their findings in order to verify the
+hash. Zero-knowledge proofs, in contrast, can be immediately
+checked, even if the secret is not public yet and perhaps
+never will be.
 
 \begin{figure}
 \begin{center}
@@ -157,12 +156,15 @@
 should emerge from $A$ then Alice is in trouble: Bob will find
 out she does not actually know the secret. So in order to fool
 Bob she needs to anticipate his call, and already go into the
-corresponding tunnel. This of course also does not work.
-Consequently in order to find out whether Alice cheats, Bob
-just needs to repeat this protocol many times. Each time Alice
-has a chance of $\frac{1}{2}$ to be lucky or being found out.
-Iterating this $n$ times means she must be right every time
-and when cheating the probability for this is $\frac{1}{2}^n$.
+corresponding tunnel. This of course also does not work, since
+Bob can make any call he wants. Consequently in order to find
+out whether Alice cheats, Bob just needs to repeat this
+protocol many times. Each time Alice has a chance of
+$\frac{1}{2}$ to be lucky or being found out. Iterating this
+$n$ times means she must be right every time and when
+cheating: the probability for this is $\frac{1}{2}^n$, number
+that for already relatively small $n$, say 10, is incredibly 
+small.
 
 
 There are some interesting observations we can make about 
@@ -261,18 +263,22 @@
 follow the following protocol:
 
 \begin{enumerate}
-\item Alice generates an isomorphic graph $H$ which she
-      sends to Bob. 
-\item After receiving $H$, Bob asks Alice either for an
+\item Alice generates an isomorphic graph $H$ which she sends 
+      to Bob (in each iteration she needs to generate a 
+      different $H$). 
+\item
+      After receiving $H$, Bob asks Alice either for an      
       isomorphism between $G_1$ and $H$, or $G_2$ and $H$.
 \item Alice and Bob repeat this procedure $n$ times.
 \end{enumerate}
 
 \noindent In Step 1 it is important that Alice always 
-generates a fresh isomorphic graph. As said before, 
-this is relatively easy to generate by consistently
-relabelling nodes. If she started from $G_1$, Alice will 
-have generated
+generates a fresh isomorphic graph. I let you think what
+would happen if Alice sends out twice the same graph $H$.
+
+As said before, this is relatively easy to generate by
+consistently relabelling nodes. If she started from $G_1$,
+Alice will have generated
 
 \begin{equation}
 H = \sigma'(G_1)\label{hiso}
@@ -323,7 +329,8 @@
 
 \noindent As can be seen the protocol runs for some
 agreed number of iterations. The $H_i$ Alice needs to 
-produce, need to be all distinct. I let you think why?
+produce, need to be all distinct. I hope you now know
+why?
 
 It is also crucial that in each iteration, Alice first sends
 $H_i$ and then Bob can decide which isomorphism he wants:
@@ -354,7 +361,8 @@
       $H_{1..n}$ (they need to be all distinct)
 \item she feeds the $H_{1..n}$ into a hashing function
       (for example encoded as as matrix)
-\item she takes the first $n$ bits of the output:
+\item she takes the first $n$ bits of the output of the hashing
+      function:
       whenever the output is $0$, she shows an isomorphism
       with $G_1$; for $1$ she shows an isomorphism
       with $G_2$
@@ -362,7 +370,7 @@
 
 \noindent The reason why this works and achieves the same
 goal as the interactive variant is that Alice has no
-control over the hashing functions. It would be 
+control over the hashing function. It would be 
 computationally just too hard to assemble a set of
 $H_{1..n}$ such that she can force where $0$s and $1$s
 in the hash values are such that it would pass an external
@@ -378,7 +386,7 @@
 any NP problem can be used as part of a ZKP protocol.  
 
 
-\subsubsection*{Using Modular Arithmetic for ZKP Protocols}
+\subsubsection*{Using Modular Logarithms for ZKP Protocols}
 
 While information can be encoded into graph isomorphisms, it
 is not the most convenient carrier of information. Clearly it
@@ -393,17 +401,207 @@
 \end{center} 
 
 \noindent holds. The secret Alice tries to keep secret is $x$.
-\bigskip
+The point of the modular logarithm is that it is very hard 
+from the public data to calculate $x$ (for large primes). 
+Now the protocol proceeds in three stages:
+
+\begin{itemize}
+\item {\bf Commitment Stage}
+  \begin{enumerate}
+    \item Alice generates $z$ random numbers $r_1, \ldots, r_z$, 
+      all less than $p - 1$. Alice then sends Bob for all 
+      $i = 1,\ldots, z$:
+      \[ h_i = A^{r_i}\; mod\; p\]
+    \item Bob generates $z$ random bits, say $b_1,\ldots, b_z$. He can do this 
+       by flipping $z$ times a coin, for example.
+    \item For each bit $b_i$, Alice sends Bob an $s_i$ where
+
+      \begin{center}
+      \begin{tabular}{ll}
+      if $b_i = 0$: & $s_i = r_i$\\
+      if $b_i = 1$: & $s_i = (r_i - r_j) \;mod\; (p -1)$\\
+      \end{tabular}
+      \end{center}
+
+      where $r_j$ is the lowest $j$ where $b_j = 1$.
+ 
+    \end{enumerate}
+ \end{itemize}   
+    
+\noindent For understanding the last step, let $z$ be just 4.
+We have four random values $r_i$ chosen by Alice and four
+random bits $b_i$ chosen subsequently by Bob, for example
+  
+  \begin{center}
+  \begin{tabular}{lcccc}
+  $r_i$:\; & 4 & 9 & 1 & 3\\ 
+  $b_i$:\; & 0 & 1 & 0 & 1\\
+  & & $\uparrow$ \\
+  & & $j$
+  \end{tabular}             
+  \end{center}         
+    
+  \noindent The highlighted column is the lowest where $b_i = 
+  1$ (counted from the left). That means $r_j = 9$. The reason
+  for letting Alice choose the random numbers $r_1, \ldots, r_z$
+  will become clear shortly. Next is the confirmation 
+  phase where Bob essentially checks whether Alice has sent
+  him ``correct'' $s_i$ and $h_i$.
+    
+\begin{itemize}   
+\item {\bf Confirmation Stage}
+  \begin{enumerate}
+  \item For each $b_i$ Bob checks whether $s_i$ 
+  conform to the protocol
+
+  \begin{center}
+  \begin{tabular}{ll}
+  if $b_i = 0$: & $A^{s_i} \stackrel{?}{\equiv} h_i\;mod\;p$\\
+  if $b_i = 1$: & $A^{s_i} \stackrel{?}{\equiv} h_i * h_j^{-1}  \;mod\; p$\\
+  \end{tabular}
+  \end{center}
+  \end{enumerate}
+\end{itemize}
+
+\noindent To understand the case for $b_i = 1$, you have 
+to do the following calculation: 
+
+\begin{center}
+\begin{tabular}{r@{\hspace{1mm}}c@{\hspace{1mm}}l}
+$A^{s_i}$ & $=$ & $A^{r_i - r_j}$\\ 
+          & $=$ & $A^{r_i} * A^{-r_j}$\\
+          & $=$ & $h_{r_i} * h_{r_j}^{-1}\;mod\;p$   
+\end{tabular}
+\end{center}
+
+\noindent What is interesting that so far nothing has been 
+sent about $x$, which is the secret Alice has. Also notice 
+that Bob does not know $r_j$. He received 
+
+\begin{center}
+$r_j - r_j$,  $r_m - r_j$, \ldots, $r_p - r_j \;mod \;p - 1$ 
+\end{center}
+
+\noindent whenever his corresponding bits were $1$. So Bob
+does not know $r_j$ and also does not know any $r_i$ where the
+bit was $1$. Information about the $x$ is sent in the next
+stage (obviously not revealing $x$).
+
+\begin{itemize}   
+\item {\bf Proving Stage}
+
+\begin{enumerate}
+\item Alice proves she knows $x$, the discrete log of $B$,
+by sending
+
+\begin{center}
+$s_{z+1} = x - r_j\;mod\;p-1$
+\end{center}
+
+\item Bob confirms
+
+\begin{center}
+$A^{s_{z+1}} \stackrel{?}{\equiv} B * h_j^{-1} \;mod \; p$
+\end{center}
+\end{enumerate}
+\end{itemize}
+
+\noindent To understand the last step, you have to do the trick
+again that 
+
+\[A^{s_{z+1}} = A^{x-r_j} = \ldots
+\]
 
 \noindent
-\ldots still to be completed (for example can be attacked by
-MITM attacks)
+which I leave to you.
+
+Now the question is how can Alice cheat? In order to cheat she
+has to coordinate what she sends as $h_i$ in step 1 and $s_i$
+in step 3 of the commitment stage, and also what to send as
+$s_{z+1}$ in the proving stage. For the latter of course 
+Alice does not know $x$, so she just chooses some random 
+number for $s_{z+1}$ and calculates 
+
+\[A^{s_{z+1}}\]
+
+\noindent
+and then solves the equation
+
+\[A^{s_{z+1}} \equiv B * y \;mod\;p\]
+
+\noindent for $y$. This is easy since no logarithm needs to be
+computed. If Alice can guess the $j$ where the first 1 will
+appear in Bob's bit vector, then she sends the inverse of $y$
+as $h_j$ and 0 as $s_j$. However, notice that when she
+calculates a solution for $y$ she does not know $r_j$. For this she
+would need to calculate the modular logarithm
+
+
+\[
+y \equiv A^{r_j}\;mod\;p
+\] 
+
+\noindent which is hard (see step 1 in the commitment stage).
+
+Having settled on what $h_j$ should be, now what should Alice
+send as the other $h_i$ and other $s_i$? If the $b_i$ happens
+to be a 1, then the $h_i$ and other $s_i$ need to satisfy the 
+test
+
+\[A^{s_i} \stackrel{?}{\equiv} h_i * h_j^{-1}  \;mod\; p\]
+
+\noindent where she has already settled on the value of 
+$h_j^{-1}$. Lets say she choses $s_i$ at random, then she just 
+needs to solve
+
+\[A^{s_i} \equiv z * h_j^{-1}  \;mod\; p\] 
+
+\noindent for $z$. Again that is easy, but it does not allow 
+us to know $r_i$, because then we would again need to solve
+a modular logarithm problem. Let us call an $h_i$ which was 
+solved the easy way as \emph{bogus}. Alice has to produce
+bogus $h_i$ for all bits that are going to be $1$ in advance!
+This means she has to guess all the bits correctly. (Yes?)
+
+Let us see what happens if she guesses wrongly: Suppose the
+bit $b_i = 1$ where she thought she will get a 0. Then she has
+already sent an $h_i$ and $h_j$ and now must find an $s_i$
+such that 
+
+\[A^{s_i} \equiv h_i * h_j^{-1}  \;mod\; p\]
+
+\noindent holds. For this remember in calculating $h_i$, she
+just chose a random $s_i$. Now she has to send a genuine one.
+But this is of course too hard. If she knew the genuine $r_i$
+and $r_j$ for $h_i$ and $h_j$, it would be easy (in this case
+$s_i = r_i - r_j$). But she does not. So she will be found
+out. If $b_i = 0$, but she thought she will get a 1, then 
+she has to send a $s_i$ which satisfies
+
+\[A^{s_i} \equiv h_i\;mod\;p\]
+
+\noindent Again she does not know $r_i$. So it is a too hard 
+task and she will be found out again.
+
+To sum up, in order for Alice to successfully cheat Bob, she
+needs to guess \emph{all} bits correctly. She has only a
+$\frac{1}{2^z}$ chance of doing this.
 
 \subsubsection*{Further Reading}
 
 Make sure you understand what NP problems
-are.\footnote{\url{http://en.wikipedia.org/wiki/NP_(complexity)}} They
-are the building blocks for zero-knowledge proofs.
+are.\footnote{\url{http://en.wikipedia.org/wiki/NP_(complexity)}}
+They are the building blocks for zero-knowledge proofs.
+Zero-Knowldege proofs are not yet widely used in production
+systems, but it is slowly gaining ground. One application
+where they pop up are crypto currencies.
+
+If you want to brush up on the modular logarithm problem,
+the Khan Academy has a nice video:
+
+\begin{center}
+\url{https://www.khanacademy.org/video/discrete-logarithm-problem}
+\end{center}
 
 \end{document}
 
Binary file slides/slides06.pdf has changed
--- a/slides/slides06.tex	Sun Oct 25 23:34:04 2015 +0000
+++ b/slides/slides06.tex	Thu Nov 05 02:11:13 2015 +0000
@@ -427,7 +427,7 @@
 
 \begin{center}
 \begin{tabular}{ll}
-\bl{$b_i = 0$}: & \bl{$A^{s_i} \equiv B\;mod\;p$}\\
+\bl{$b_i = 0$}: & \bl{$A^{s_i} \equiv h_i\;mod\;p$}\\
 \bl{$b_i = 1$}: & \bl{$A^{s_i}  \equiv h_i * h_j^{-1}  \;mod\; p$}\\
 \end{tabular}
 \end{center}\bigskip
@@ -435,7 +435,7 @@
 Bob was sent
 
 \begin{center}
-\bl{$r_j - r _j$},  \bl{$r_m - r _j$}, \ldots, \bl{$r_p - r _j$ \;mod \;p} 
+\bl{$r_j - r_j$},  \bl{$r_m - r_j$}, \ldots, \bl{$r_p - r_j \;mod \;p - 1$} 
 \end{center}
 
 where the corresponding bits were 
@@ -465,7 +465,7 @@
 \end{enumerate}\bigskip\pause
 
 In order to cheat, Alice has to guess all bits in advance. She
-has only \bl{$\frac{1}{2}^z$} chance.\bigskip\\
+has only \bl{$\frac{1}{2}^z$} chance of doing so.\bigskip\\
 
 \small\hspace{7mm}
 \textcolor{gray}{(explanation $\rightarrow$ \url{http://goo.gl/irL9GK})}