handouts/ho07.tex
changeset 310 591b62e1f86a
parent 309 b1ba3d88696e
child 311 8befc029ca1e
equal deleted inserted replaced
309:b1ba3d88696e 310:591b62e1f86a
    56 \noindent How disgusting? Even worse, Verizon is not known for
    56 \noindent How disgusting? Even worse, Verizon is not known for
    57 being the cheapest ISP on the planet (completely the
    57 being the cheapest ISP on the planet (completely the
    58 contrary), and also not known for providing the fastest
    58 contrary), and also not known for providing the fastest
    59 possible speeds, but rather for being among the few ISPs in
    59 possible speeds, but rather for being among the few ISPs in
    60 the US with a quasi-monopolistic ``market distribution''.
    60 the US with a quasi-monopolistic ``market distribution''.
       
    61 
       
    62 
    61 Well, we could go on and on\ldots{}and that has not even
    63 Well, we could go on and on\ldots{}and that has not even
    62 started us yet with all the naughty things NSA \& Friends are
    64 started us yet with all the naughty things NSA \& Friends are
    63 up to. 
    65 up to. Why does privacy matter? Nobody, I think, has a
    64 
    66 conclusive answer to this question yet. Maybe the following four
    65 Why does privacy matter? Nobody, I think, has a conclusive
    67 notions help with clarifying the overall picture somewhat: 
    66 answer to this question. Maybe the following four notions
       
    67 help with clarifying the overall picture somewhat: 
       
    68 
    68 
    69 \begin{itemize}
    69 \begin{itemize}
    70 \item \textbf{Secrecy} is the mechanism used to limit the
    70 \item \textbf{Secrecy} is the mechanism used to limit the
    71       number of principals with access to information (e.g.,
    71       number of principals with access to information (e.g.,
    72       cryptography or access controls). For example I better
    72       cryptography or access controls). For example I better
    99 \end{itemize}
    99 \end{itemize}
   100 
   100 
   101 \noindent While this might provide us with some rough
   101 \noindent While this might provide us with some rough
   102 definitions, the problem with privacy is that it is an
   102 definitions, the problem with privacy is that it is an
   103 extremely fine line what should stay private and what should
   103 extremely fine line what should stay private and what should
   104 not. For example, since I am working in academia, I am very
   104 not. For example, since I am working in academia, I am every
   105 happy to be a digital exhibitionist: I am very happy to
   105 so often very happy to be a digital exhibitionist: I am very
   106 disclose all `trivia' related to my work on my personal
   106 happy to disclose all `trivia' related to my work on my
   107 web-page. This is a kind of bragging that is normal in
   107 personal web-page. This is a kind of bragging that is normal
   108 academia (at least in the field of CS), even expected if you
   108 in academia (at least in the field of CS), even expected if
   109 look for a job. I am even happy that Google maintains a
   109 you look for a job. I am even happy that Google maintains a
   110 profile about all my academic papers and their citations. 
   110 profile about all my academic papers and their citations. 
   111 
   111 
   112 On the other hand I would be very irritated if anybody I do
   112 On the other hand I would be very irritated if anybody I do
   113 not know had a too close look on my private live---it
   113 not know had a too close look on my private live---it
   114 shouldn't be anybody's business. The reason is that knowledge
   114 shouldn't be anybody's business. The reason is that knowledge
   125 is, since recently, a law that allows you to check what
   125 is, since recently, a law that allows you to check what
   126 information is held about you for determining your
   126 information is held about you for determining your
   127 creditworthiness. But this concerns only a very small part of
   127 creditworthiness. But this concerns only a very small part of
   128 the data that is held about me/you.
   128 the data that is held about me/you.
   129 
   129 
       
   130 Take the example of Stephen Hawking: when he was diagnosed
       
   131 with his disease, he was given a life expectancy of two years.
       
   132 If an employer would know about such problems, would they have
       
   133 employed Hawking? Now he is enjoying his 70+ birthday.
       
   134 Clearly personal medical data needs to stay private.
       
   135 
   130 To cut a long story short, I let you ponder about the two
   136 To cut a long story short, I let you ponder about the two
   131 statements that often voiced in discussions about privacy:
   137 statements that often voiced in discussions about privacy:
   132 
   138 
   133 \begin{itemize}
   139 \begin{itemize}
   134 \item \textit{``You have zero privacy anyway. Get over it.''}\\
   140 \item \textit{``You have zero privacy anyway. Get over it.''}\\
   149 article carefully tries to construct an argument that does not
   155 article carefully tries to construct an argument that does not
   150 only attack the nothing-to-hide statement in cases where
   156 only attack the nothing-to-hide statement in cases where
   151 governments \& Co collect people's deepest secrets, or
   157 governments \& Co collect people's deepest secrets, or
   152 pictures of people's naked bodies, but an argument that
   158 pictures of people's naked bodies, but an argument that
   153 applies also in cases where governments ``only'' collect data
   159 applies also in cases where governments ``only'' collect data
   154 relevant to, say, preventing terrorism. The fun is of course,
   160 relevant to, say, preventing terrorism. The fun is of course
   155 in 2011 we could just not imagine that respected governments
   161 that in 2011 we could just not imagine that respected
   156 would do such infantile things as intercepting people's nude
   162 governments would do such infantile things as intercepting
   157 photos. Well, since Snowden we know some people at the NSA did
   163 people's nude photos. Well, since Snowden we know some people
   158 and then shared such photos among colleagues as ``fringe
   164 at the NSA did exactly that and then shared such photos among
   159 benefit''.  
   165 colleagues as ``fringe benefit''.  
   160 
   166 
   161 
   167 
   162 \subsubsection*{Re-Identification Attacks} 
   168 \subsubsection*{Re-Identification Attacks} 
   163 
   169 
   164 Apart from philosophical arguments, there are fortunately also
   170 Apart from philosophical musings, there are fortunately also
   165 some real technical problems with privacy implications. The
   171 some real technical problems with privacy. The problem I want
   166 problem I want to focus on in this handout is how to safely
   172 to focus on in this handout is how to safely disclose datasets
   167 disclose datasets containing potentially private data, say
   173 containing potentially private data, say health data. What can
   168 health data. What can go wrong with such disclosures can be
   174 go wrong with such disclosures can be illustrated with four
   169 illustrated with four examples:
   175 well-known examples:
   170 
   176 
   171 \begin{itemize}
   177 \begin{itemize}
   172 \item In 2006, a then young company called Netflix offered a 1
   178 \item In 2006, a then young company called Netflix offered a 1
   173       Mio \$ prize to anybody who could improve their movie
   179       Mio \$ prize to anybody who could improve their movie
   174       rating algorithm. For this they disclosed a dataset
   180       rating algorithm. For this they disclosed a dataset
   185       dataset: either by their ratings or by the dates the
   191       dataset: either by their ratings or by the dates the
   186       ratings were uploaded. The result was a class-action 
   192       ratings were uploaded. The result was a class-action 
   187       suit against Netflix, which was only recently resolved
   193       suit against Netflix, which was only recently resolved
   188       involving a lot of money.
   194       involving a lot of money.
   189 
   195 
   190 \item In the 1990ies, medical datasets were often made public for
   196 \item In the 1990ies, medical datasets were often made public
   191       research purposes. This was done in anonymised form with
   197       for research purposes. This was done in anonymised form
   192       names removed, but birth dates, gender, ZIP-code were
   198       with names removed, but birth dates, gender, ZIP-code
   193       retained. In one case where such data was made public
   199       were retained. In one case where such data about
   194       about state employees in Massachusetts, the then
   200       hospital visits of state employees in Massachusetts was
   195       governor assured the public that the released dataset
   201       made public, the then governor assured the public that
   196       protected patient privacy by deleting identifiers. A
   202       the released dataset protected patient privacy by
   197       graduate student could not resist and cross-referenced
   203       deleting identifiers. A graduate student could not
   198       public voter data with the data about birth dates,
   204       resist cross-referencing public voter data with the
   199       gender, ZIP-code. The result was that she could send
   205       released data including birth dates, gender and
   200       the governor his own hospital record.
   206       ZIP-code. The result was that she could send the
       
   207       governor his own hospital record. It turns out that
       
   208       birth dates, gender and ZIP-code uniquely identify 87\%
       
   209       people in the US.
   201  
   210  
   202 \item In 2006, AOL published 20 million Web search queries
   211 \item In 2006, AOL published 20 million Web search queries
   203       collected of 650,000 users (names had been deleted).
   212       collected from 650,000 users (names had been deleted).
   204       This was again for research purposes. However, within
   213       This was again done for research purposes. However,
   205       days an old lady, Thelma Arnold, from Lilburn, Georgia,
   214       within days an old lady, Thelma Arnold, from Lilburn,
   206       (11,596 inhabitants) was identified as user No.~4417749
   215       Georgia, (11,596 inhabitants) was identified as user
   207       in this dataset. It turned out that search engine
   216       No.~4417749 in this dataset. It turned out that search
   208       queries are windows into people's private lives. 
   217       engine queries are deep windows into people's private
       
   218       lives. 
   209   
   219   
   210 \item Genomic-Wide Association Studies (GWAS) was a public
   220 \item Genomic-Wide Association Studies (GWAS) was a public
   211       database of gene-frequency studies linked to diseases.
   221       database of gene-frequency studies linked to diseases.
       
   222       
       
   223       
   212       you only needed partial DNA information in order to
   224       you only needed partial DNA information in order to
   213       identify whether an individual was part of the study —
   225       identify whether an individual was part of the study —
   214       DB closed in 2008
   226       DB closed in 2008
   215       
   227       
   216 \end{itemize}
   228 \end{itemize}