sen-material: comparison handouts/ho06.tex

equal deleted inserted replaced

-:38ddbc59325a
+:abe178b3197e
 loose as one of a series or forming part of a bound volume,
 which is numbered on the recto or front side only.''
 \end{quote}
 \noindent
-Take the first non-article word in this definition,
+Take the first non-small word\footnote{Let's say the, a, an,
+or, and \ldots fall into the category of small words.} in this definition,
 in this case \textit{individual}, and look up the definition
 of this word, say
 \begin{quote}
 ``a single \textit{human} being as distinct from a group''
 \end{quote}
 \noindent
-In this definition take the second non-article word, that
+In this definition take the second non-small word, that
 is \textit{human}, and again look up the definition of this
 word. This will yield
 \begin{quote}
-``relating to \textit{or} characteristic of humankind''
+``relating to or \textit{characteristic} of humankind''
 \end{quote}
 \noindent
 You could go on looking up the definition of the third
-non-article in this definition and so on. But let us assume
+non-small word in this definition and so on. But let us assume
 you agreed with Bob to stop after three iterations with the
 third non-article word in the last definition, that is
 \textit{or}. Now, instead of sending to Bob the solution
-\textit{folio}, you send to him \textit{or}.
+\textit{folio}, you send to him \textit{characteristic}.
 How can Bob verify that you know the solution? Well, once he
 solved it himself, he can use the dictionary and follow the
 same ``trail'' as you did. If the final word agrees with what
-you had sent him, he must infer you knew the solution earlier than
+you had sent him, he must infer you knew the solution earlier
-him. This protocol works like a one-way hash function because
+than him. This protocol works like a one-way hash function
-\textit{or} does not give any hint as to what was the first
+because \textit{characteristic} does not give any hint as to
-word was. I leave you to think why this protocol avoids
+what was the first word was. I leave you to think why this
-articles?
+protocol avoids small words?
 After Bob found his solution and verified that according to
-the protocol it ``maps'' also to \textit{or}, can he be
+the protocol it ``maps'' also to \textit{characteristic}, can
-entirely sure no cheating is going on? Not with 100\%
+he be entirely sure no cheating is going on? Not with 100\%
-certainty. It could have been possible that he was
+certainty. It could have been possible that he was given
-given \textit{or} as the word, but it derived from a
+\textit{characteristic} as the word, but it derived from a
-different word. This might seem very unlikely, but at
+different word. This might seem very unlikely, but at least
-least theoretical it is a possibility. Protocols based on
+theoretical it is a possibility. Protocols based on
 zero-knowledge proofs will produce a similar result---they
 give an answer that might be erroneous in a very small number
-of cases. The point is to iterate them long enough so that
+of cases. The point is to iterate them long enough so that the
-the theoretical possibility of cheating is negligibly small.
+theoretical possibility of cheating is negligibly small.
 By the way, the authors of the paper ``Dismantling Megamos
 Crypto: Wirelessly Lockpicking a Vehicle Immobilizer'' who
 were barred from publishing their results used also a hash to
 prove they did the work and (presumably) managed to get into
 method is: yes, we can hide the secret temporarily, but if
 somebody else wants to verify it, then the secret has to be
 made public. Bob needs to know that \textit{folio} is the
 solution before he can verify the claim of Alice that she had
 the solution first. Similarly with the car-crypto paper: we
-need to wait until the authors are finally allowed to publish
+needed to wait until September 2015 when the authors were
-their findings in order to verify the hash. This might happen
+finally able to publish their findings in order to verify the
-at some point, but equally it might never happen (what for
+hash. Zero-knowledge proofs, in contrast, can be immediately
-example happens if the authors lose their copy of the paper
+checked, even if the secret is not public yet and perhaps
-because of a disk failure?). Zero-knowledge proofs, in
+never will be.
-contrast, can be immediately checked, even if the secret is
-not public yet and perhaps never will be.
 \begin{figure}
 \begin{center}
 \addtolength{\fboxsep}{4mm}
 \fbox{\includegraphics[scale=0.4]{../pics/Dismantling_Megamos_Crypto.png}}
 for example the $B$-segment of the tunnel. If now Bob says she
 should emerge from $B$, she is lucky. But if he says she
 should emerge from $A$ then Alice is in trouble: Bob will find
 out she does not actually know the secret. So in order to fool
 Bob she needs to anticipate his call, and already go into the
-corresponding tunnel. This of course also does not work.
+corresponding tunnel. This of course also does not work, since
-Consequently in order to find out whether Alice cheats, Bob
+Bob can make any call he wants. Consequently in order to find
-just needs to repeat this protocol many times. Each time Alice
+out whether Alice cheats, Bob just needs to repeat this
-has a chance of $\frac{1}{2}$ to be lucky or being found out.
+protocol many times. Each time Alice has a chance of
-Iterating this $n$ times means she must be right every time
+$\frac{1}{2}$ to be lucky or being found out. Iterating this
-and when cheating the probability for this is $\frac{1}{2}^n$.
+$n$ times means she must be right every time and when
+cheating: the probability for this is $\frac{1}{2}^n$, number
+that for already relatively small $n$, say 10, is incredibly
+small.
 There are some interesting observations we can make about
 Alibaba's cave and the ZKP protocol between Alice and Bob:
 graphs, say $G_1$ and $G_2$, using the same idea as in the
 example with Alibaba's cave. For this Alice and Bob must
 follow the following protocol:
 \begin{enumerate}
-\item Alice generates an isomorphic graph $H$ which she
+\item Alice generates an isomorphic graph $H$ which she sends
-sends to Bob.
+to Bob (in each iteration she needs to generate a
-\item After receiving $H$, Bob asks Alice either for an
+different $H$).
+\item
+After receiving $H$, Bob asks Alice either for an
 isomorphism between $G_1$ and $H$, or $G_2$ and $H$.
 \item Alice and Bob repeat this procedure $n$ times.
 \end{enumerate}
 \noindent In Step 1 it is important that Alice always
-generates a fresh isomorphic graph. As said before,
+generates a fresh isomorphic graph. I let you think what
-this is relatively easy to generate by consistently
+would happen if Alice sends out twice the same graph $H$.
-relabelling nodes. If she started from $G_1$, Alice will
-have generated
+As said before, this is relatively easy to generate by
+consistently relabelling nodes. If she started from $G_1$,
+Alice will have generated
 \begin{equation}
 H = \sigma'(G_1)\label{hiso}
 \end{equation}
 \end{tabular}
 \end{center}
 \noindent As can be seen the protocol runs for some
 agreed number of iterations. The $H_i$ Alice needs to
-produce, need to be all distinct. I let you think why?
+produce, need to be all distinct. I hope you now know
+why?
 It is also crucial that in each iteration, Alice first sends
 $H_i$ and then Bob can decide which isomorphism he wants:
 either $G_1 \leftrightarrow H_i$ or $G_2 \leftrightarrow H_i$.
 If somehow Alice can find out before she committed to $H_i$,
 \begin{enumerate}
 \item Alice generates $n$ isomorphic graphs
 $H_{1..n}$ (they need to be all distinct)
 \item she feeds the $H_{1..n}$ into a hashing function
 (for example encoded as as matrix)
-\item she takes the first $n$ bits of the output:
+\item she takes the first $n$ bits of the output of the hashing
+function:
 whenever the output is $0$, she shows an isomorphism
 with $G_1$; for $1$ she shows an isomorphism
 with $G_2$
 \end{enumerate}
 \noindent The reason why this works and achieves the same
 goal as the interactive variant is that Alice has no
-control over the hashing functions. It would be
+control over the hashing function. It would be
 computationally just too hard to assemble a set of
 $H_{1..n}$ such that she can force where $0$s and $1$s
 in the hash values are such that it would pass an external
 test. The point is that Alice can publish all this data
 on the comfort of her own web-page, for example, and
 that encoding any secret into a graph-isomorphism, while
 possible, is awkward. The good news is that in fact
 any NP problem can be used as part of a ZKP protocol.
-\subsubsection*{Using Modular Arithmetic for ZKP Protocols}
+\subsubsection*{Using Modular Logarithms for ZKP Protocols}
 While information can be encoded into graph isomorphisms, it
 is not the most convenient carrier of information. Clearly it
 is much easier to encode information into numbers. Let us look
 at zero-knowledge proofs that use numbers as secrets. For this
 \begin{center}
 $A^x \equiv B\; mod\; p$
 \end{center}
 \noindent holds. The secret Alice tries to keep secret is $x$.
-\bigskip
+The point of the modular logarithm is that it is very hard
+from the public data to calculate $x$ (for large primes).
+Now the protocol proceeds in three stages:
+\begin{itemize}
+\item {\bf Commitment Stage}
+\begin{enumerate}
+\item Alice generates $z$ random numbers $r_1, \ldots, r_z$,
+all less than $p - 1$. Alice then sends Bob for all
+$i = 1,\ldots, z$:
+\[ h_i = A^{r_i}\; mod\; p\]
+\item Bob generates $z$ random bits, say $b_1,\ldots, b_z$. He can do this
+by flipping $z$ times a coin, for example.
+\item For each bit $b_i$, Alice sends Bob an $s_i$ where
+\begin{center}
+\begin{tabular}{ll}
+if $b_i = 0$: & $s_i = r_i$\\
+if $b_i = 1$: & $s_i = (r_i - r_j) \;mod\; (p -1)$\\
+\end{tabular}
+\end{center}
+where $r_j$ is the lowest $j$ where $b_j = 1$.
+\end{enumerate}
+\end{itemize}
+\noindent For understanding the last step, let $z$ be just 4.
+We have four random values $r_i$ chosen by Alice and four
+random bits $b_i$ chosen subsequently by Bob, for example
+\begin{center}
+\begin{tabular}{lcccc}
+$r_i$:\; & 4 & 9 & 1 & 3\\
+$b_i$:\; & 0 & 1 & 0 & 1\\
+& & $\uparrow$ \\
+& & $j$
+\end{tabular}
+\end{center}
+\noindent The highlighted column is the lowest where $b_i =
+1$ (counted from the left). That means $r_j = 9$. The reason
+for letting Alice choose the random numbers $r_1, \ldots, r_z$
+will become clear shortly. Next is the confirmation
+phase where Bob essentially checks whether Alice has sent
+him ``correct'' $s_i$ and $h_i$.
+\begin{itemize}
+\item {\bf Confirmation Stage}
+\begin{enumerate}
+\item For each $b_i$ Bob checks whether $s_i$
+conform to the protocol
+\begin{center}
+\begin{tabular}{ll}
+if $b_i = 0$: & $A^{s_i} \stackrel{?}{\equiv} h_i\;mod\;p$\\
+if $b_i = 1$: & $A^{s_i} \stackrel{?}{\equiv} h_i * h_j^{-1}  \;mod\; p$\\
+\end{tabular}
+\end{center}
+\end{enumerate}
+\end{itemize}
+\noindent To understand the case for $b_i = 1$, you have
+to do the following calculation:
+\begin{center}
+\begin{tabular}{r@{\hspace{1mm}}c@{\hspace{1mm}}l}
+$A^{s_i}$ & $=$ & $A^{r_i - r_j}$\\
+& $=$ & $A^{r_i} * A^{-r_j}$\\
+& $=$ & $h_{r_i} * h_{r_j}^{-1}\;mod\;p$
+\end{tabular}
+\end{center}
+\noindent What is interesting that so far nothing has been
+sent about $x$, which is the secret Alice has. Also notice
+that Bob does not know $r_j$. He received
+\begin{center}
+$r_j - r_j$,  $r_m - r_j$, \ldots, $r_p - r_j \;mod \;p - 1$
+\end{center}
+\noindent whenever his corresponding bits were $1$. So Bob
+does not know $r_j$ and also does not know any $r_i$ where the
+bit was $1$. Information about the $x$ is sent in the next
+stage (obviously not revealing $x$).
+\begin{itemize}
+\item {\bf Proving Stage}
+\begin{enumerate}
+\item Alice proves she knows $x$, the discrete log of $B$,
+by sending
+\begin{center}
+$s_{z+1} = x - r_j\;mod\;p-1$
+\end{center}
+\item Bob confirms
+\begin{center}
+$A^{s_{z+1}} \stackrel{?}{\equiv} B * h_j^{-1} \;mod \; p$
+\end{center}
+\end{enumerate}
+\end{itemize}
+\noindent To understand the last step, you have to do the trick
+again that
+\[A^{s_{z+1}} = A^{x-r_j} = \ldots
+\]
 \noindent
-\ldots still to be completed (for example can be attacked by
+which I leave to you.
-MITM attacks)
+Now the question is how can Alice cheat? In order to cheat she
+has to coordinate what she sends as $h_i$ in step 1 and $s_i$
+in step 3 of the commitment stage, and also what to send as
+$s_{z+1}$ in the proving stage. For the latter of course
+Alice does not know $x$, so she just chooses some random
+number for $s_{z+1}$ and calculates
+\[A^{s_{z+1}}\]
+\noindent
+and then solves the equation
+\[A^{s_{z+1}} \equiv B * y \;mod\;p\]
+\noindent for $y$. This is easy since no logarithm needs to be
+computed. If Alice can guess the $j$ where the first 1 will
+appear in Bob's bit vector, then she sends the inverse of $y$
+as $h_j$ and 0 as $s_j$. However, notice that when she
+calculates a solution for $y$ she does not know $r_j$. For this she
+would need to calculate the modular logarithm
+\[
+y \equiv A^{r_j}\;mod\;p
+\]
+\noindent which is hard (see step 1 in the commitment stage).
+Having settled on what $h_j$ should be, now what should Alice
+send as the other $h_i$ and other $s_i$? If the $b_i$ happens
+to be a 1, then the $h_i$ and other $s_i$ need to satisfy the
+test
+\[A^{s_i} \stackrel{?}{\equiv} h_i * h_j^{-1}  \;mod\; p\]
+\noindent where she has already settled on the value of
+$h_j^{-1}$. Lets say she choses $s_i$ at random, then she just
+needs to solve
+\[A^{s_i} \equiv z * h_j^{-1}  \;mod\; p\]
+\noindent for $z$. Again that is easy, but it does not allow
+us to know $r_i$, because then we would again need to solve
+a modular logarithm problem. Let us call an $h_i$ which was
+solved the easy way as \emph{bogus}. Alice has to produce
+bogus $h_i$ for all bits that are going to be $1$ in advance!
+This means she has to guess all the bits correctly. (Yes?)
+Let us see what happens if she guesses wrongly: Suppose the
+bit $b_i = 1$ where she thought she will get a 0. Then she has
+already sent an $h_i$ and $h_j$ and now must find an $s_i$
+such that
+\[A^{s_i} \equiv h_i * h_j^{-1}  \;mod\; p\]
+\noindent holds. For this remember in calculating $h_i$, she
+just chose a random $s_i$. Now she has to send a genuine one.
+But this is of course too hard. If she knew the genuine $r_i$
+and $r_j$ for $h_i$ and $h_j$, it would be easy (in this case
+$s_i = r_i - r_j$). But she does not. So she will be found
+out. If $b_i = 0$, but she thought she will get a 1, then
+she has to send a $s_i$ which satisfies
+\[A^{s_i} \equiv h_i\;mod\;p\]
+\noindent Again she does not know $r_i$. So it is a too hard
+task and she will be found out again.
+To sum up, in order for Alice to successfully cheat Bob, she
+needs to guess \emph{all} bits correctly. She has only a
+$\frac{1}{2^z}$ chance of doing this.
 \subsubsection*{Further Reading}
 Make sure you understand what NP problems
-are.\footnote{\url{http://en.wikipedia.org/wiki/NP_(complexity)}} They
+are.\footnote{\url{http://en.wikipedia.org/wiki/NP_(complexity)}}
-are the building blocks for zero-knowledge proofs.
+They are the building blocks for zero-knowledge proofs.
+Zero-Knowldege proofs are not yet widely used in production
+systems, but it is slowly gaining ground. One application
+where they pop up are crypto currencies.
+If you want to brush up on the modular logarithm problem,
+the Khan Academy has a nice video:
+\begin{center}
+\url{https://www.khanacademy.org/video/discrete-logarithm-problem}
+\end{center}
 \end{document}
 http://blog.cryptographyengineering.com/2014/11/zero-knowledge-proofs-illustrated-primer.html

changeset 422	abe178b3197e
parent 419	667a39dda86e
child 423	11b46fa92a85