handouts/ho07.tex
changeset 309 b1ba3d88696e
parent 308 2a814c06ae03
child 310 591b62e1f86a
equal deleted inserted replaced
308:2a814c06ae03 309:b1ba3d88696e
    13 public, for example horse owners, about the impending
    13 public, for example horse owners, about the impending
    14 novelty---a car. In my humble opinion, we are at the same
    14 novelty---a car. In my humble opinion, we are at the same
    15 stage of development with privacy. Nobody really knows what it
    15 stage of development with privacy. Nobody really knows what it
    16 is about or what it is good for. All seems very hazy. The
    16 is about or what it is good for. All seems very hazy. The
    17 result is that the world of ``privacy'' looks a little bit
    17 result is that the world of ``privacy'' looks a little bit
    18 like the old Wild West. For example, UCAS, a charity set up to
    18 like the old Wild West. Anything seems to go. 
    19 help students apply to universities, has a commercial unit
    19 
    20 that happily sells your email addresses to anybody who forks
    20 For example, UCAS, a charity set up to help students to apply
    21 out enough money in order to bombard you with spam. Yes, you
    21 to universities, has a commercial unit that happily sells your
    22 can opt out very often, but in case of UCAS any opt-out will
    22 email addresses to anybody who forks out enough money in order
    23 limit also legit emails you might actually be interested
    23 to be able to bombard you with spam. Yes, you can opt out very
       
    24 often in such ``schemes'', but in case of UCAS any opt-out
       
    25 will limit also legit emails you might actually be interested
    24 in.\footnote{The main objectionable point, in my opinion, is
    26 in.\footnote{The main objectionable point, in my opinion, is
    25 that the \emph{charity} everybody has to use for HE
    27 that the \emph{charity} everybody has to use for HE
    26 applications has actually very honourable goals (e.g.~assist
    28 applications has actually very honourable goals (e.g.~assist
    27 applicants in gaining access to universities), but in their
    29 applicants in gaining access to universities), but in their
    28 small print (or better under the link ``About us'') reveals
    30 small print (or better under the link ``About us'') reveals
    29 they set up their organisation so that they can also
    31 they set up their organisation so that they can also
    30 shamelessly sell email addresses the ``harvest''. Everything
    32 shamelessly sell email addresses they ``harvest''. Everything
    31 is of course very legal\ldots{}moral?\ldots{}well that is in
    33 is of course very legal\ldots{}moral?\ldots{}well that is in
    32 the eye of the beholder. See:
    34 the eye of the beholder. See:
    33 
    35 
    34 \url{http://www.ucas.com/about-us/inside-ucas/advertising-opportunities} 
    36 \url{http://www.ucas.com/about-us/inside-ucas/advertising-opportunities} 
    35 or
    37 or
    36 \url{http://www.theguardian.com/uk-news/2014/mar/12/ucas-sells-marketing-access-student-data-advertisers}}
    38 \url{http://www.theguardian.com/uk-news/2014/mar/12/ucas-sells-marketing-access-student-data-advertisers}}
    37 
    39 
    38 Verizon, an ISP who provides you with connectivity, has found
    40 Another example: Verizon, an ISP who provides you with
    39 a ``nice'' side-business too: When you have enabled all
    41 connectivity, has found a ``nice'' side-business too: When you
    40 privacy guards in your browser, the few you have at your
    42 have enabled all privacy guards in your browser, the few you
    41 disposal, Verizon happily adds a kind of cookie to your
    43 have at your disposal, Verizon happily adds a kind of cookie
       
    44 to your
    42 HTTP-requests.\footnote{\url{http://webpolicy.org/2014/10/24/how-verizons-advertising-header-works/}}
    45 HTTP-requests.\footnote{\url{http://webpolicy.org/2014/10/24/how-verizons-advertising-header-works/}}
    43 As shown in the picture below, this cookie will be sent to
    46 As shown in the picture below, this cookie will be sent to
    44 every web-site you visit. The web-sites then can forward the
    47 every web-site you visit. The web-sites then can forward the
    45 cookie to advertisers who in turn pay Verizon to tell them
    48 cookie to advertisers who in turn pay Verizon to tell them
    46 everything they want to know about the person who just made
    49 everything they want to know about the person who just made
    59 started us yet with all the naughty things NSA \& Friends are
    62 started us yet with all the naughty things NSA \& Friends are
    60 up to. 
    63 up to. 
    61 
    64 
    62 Why does privacy matter? Nobody, I think, has a conclusive
    65 Why does privacy matter? Nobody, I think, has a conclusive
    63 answer to this question. Maybe the following four notions
    66 answer to this question. Maybe the following four notions
    64 clarify the picture somewhat: 
    67 help with clarifying the overall picture somewhat: 
    65 
    68 
    66 \begin{itemize}
    69 \begin{itemize}
    67 \item \textbf{Secrecy} is the mechanism used to limit the
    70 \item \textbf{Secrecy} is the mechanism used to limit the
    68       number of principals with access to information (e.g.,
    71       number of principals with access to information (e.g.,
    69       cryptography or access controls). For example I better
    72       cryptography or access controls). For example I better
    89       not like to disclose that I am pregnant, if I were
    92       not like to disclose that I am pregnant, if I were
    90       a woman, or that I am a father. Similarly, I might not
    93       a woman, or that I am a father. Similarly, I might not
    91       like to disclose my location data, because thieves might
    94       like to disclose my location data, because thieves might
    92       break into my house if they know I am away at work. 
    95       break into my house if they know I am away at work. 
    93       Privacy is essentially everything which `shouldn't be
    96       Privacy is essentially everything which `shouldn't be
    94       anybodies business'.
    97       anybody's business'.
    95 
    98 
    96 \end{itemize}
    99 \end{itemize}
    97 
   100 
    98 \noindent While this might provide us with some rough
   101 \noindent While this might provide us with some rough
    99 definitions, the problem with privacy is that it is an
   102 definitions, the problem with privacy is that it is an
   100 extremely fine line what should stay private and what should
   103 extremely fine line what should stay private and what should
   101 not. For example, since I am working in academia, I am very
   104 not. For example, since I am working in academia, I am very
   102 happy to be essentially a digital exhibitionist: I am happy to
   105 happy to be a digital exhibitionist: I am very happy to
   103 disclose all `trivia' related to my work on my personal
   106 disclose all `trivia' related to my work on my personal
   104 web-page. This is a kind of bragging that is normal in
   107 web-page. This is a kind of bragging that is normal in
   105 academia (at least in the CS field). I am even happy that
   108 academia (at least in the field of CS), even expected if you
   106 Google maintains a profile about all of my academic papers and
   109 look for a job. I am even happy that Google maintains a
   107 their citations. 
   110 profile about all my academic papers and their citations. 
   108 
   111 
   109 On the other hand I would be very peeved if anybody had a too
   112 On the other hand I would be very irritated if anybody I do
   110 close look on my private live---it shouldn't be anybodies
   113 not know had a too close look on my private live---it
   111 business. The reason is that knowledge about my private life
   114 shouldn't be anybody's business. The reason is that knowledge
   112 usually is used against me. As mentioned above, public
   115 about my private life usually is used against me. As mentioned
   113 location data might mean I get robbed. If supermarkets build a
   116 above, public location data might mean I get robbed. If
   114 profile of my shopping habits, they will use it to
   117 supermarkets build a profile of my shopping habits, they will
   115 \emph{their} advantage---surely not to \emph{my} advantage.
   118 use it to \emph{their} advantage---surely not to \emph{my}
   116 Also whatever might be collected about my life will always be
   119 advantage. Also whatever might be collected about my life will
   117 an incomplete, or even misleading, picture---I am sure my
   120 always be an incomplete, or even misleading, picture---for
   118 creditworthiness score was temporarily(?) destroyed by not
   121 example I am sure my creditworthiness score was temporarily(?)
   119 having a regular income in this country (before coming to
   122 destroyed by not having a regular income in this country
   120 King's I worked in Munich). To correct such incomplete or
   123 (before coming to King's I worked in Munich for five years).
   121 flawed data there is, since recently, a law that allows you to
   124 To correct such incomplete or flawed credit history data there
   122 check what information is held about you for determining your
   125 is, since recently, a law that allows you to check what
       
   126 information is held about you for determining your
   123 creditworthiness. But this concerns only a very small part of
   127 creditworthiness. But this concerns only a very small part of
   124 the data that is held about me/you.
   128 the data that is held about me/you.
   125 
   129 
   126 This is an endless field. I let you ponder about the two
   130 To cut a long story short, I let you ponder about the two
   127 statements that are often float about in discussions about
   131 statements that often voiced in discussions about privacy:
   128 privacy:
       
   129 
   132 
   130 \begin{itemize}
   133 \begin{itemize}
   131 \item \textit{``You have zero privacy anyway. Get over it.''}\\
   134 \item \textit{``You have zero privacy anyway. Get over it.''}\\
   132 \mbox{}\hfill{}Scott Mcnealy (CEO of Sun)
   135 \mbox{}\hfill{}{\small{}by Scott Mcnealy (CEO of Sun)}
   133 
   136 
   134 \item \textit{``If you have nothing to hide, you have nothing 
   137 \item \textit{``If you have nothing to hide, you have nothing 
   135 to fear.''}
   138 to fear.''}
   136 \end{itemize}
   139 \end{itemize}
   137  
   140  
   138 \noindent There are some technical problems that are easier to
   141 \noindent An article that attempts a deeper analysis appeared
   139 discuss and that often have privacy implications. The problem
   142 in 2011 in the Chronicle of Higher Education
   140 I want to focus on is how to safely disclose datasets. What
   143 
   141 can go wrong with this can be illustrated with three examples:
   144 \begin{center} 
       
   145 \url{http://chronicle.com/article/Why-Privacy-Matters-Even-if/127461/} 
       
   146 \end{center} 
       
   147 
       
   148 \noindent Funnily, or maybe not so funnily, the author of this
       
   149 article carefully tries to construct an argument that does not
       
   150 only attack the nothing-to-hide statement in cases where
       
   151 governments \& Co collect people's deepest secrets, or
       
   152 pictures of people's naked bodies, but an argument that
       
   153 applies also in cases where governments ``only'' collect data
       
   154 relevant to, say, preventing terrorism. The fun is of course,
       
   155 in 2011 we could just not imagine that respected governments
       
   156 would do such infantile things as intercepting people's nude
       
   157 photos. Well, since Snowden we know some people at the NSA did
       
   158 and then shared such photos among colleagues as ``fringe
       
   159 benefit''.  
       
   160 
       
   161 
       
   162 \subsubsection*{Re-Identification Attacks} 
       
   163 
       
   164 Apart from philosophical arguments, there are fortunately also
       
   165 some real technical problems with privacy implications. The
       
   166 problem I want to focus on in this handout is how to safely
       
   167 disclose datasets containing potentially private data, say
       
   168 health data. What can go wrong with such disclosures can be
       
   169 illustrated with four examples:
   142 
   170 
   143 \begin{itemize}
   171 \begin{itemize}
   144 \item In 2006 a then young company called Netflix offered a 1
   172 \item In 2006, a then young company called Netflix offered a 1
   145       Mio \$ prize to anybody who could improve their movie
   173       Mio \$ prize to anybody who could improve their movie
   146       rating algorithm. For this they disclosed a dataset
   174       rating algorithm. For this they disclosed a dataset
   147       containing 10\% of all Netflix users (appr.~500K). They
   175       containing 10\% of all Netflix users at the time
   148       removed names, but included numerical ratings as well as
   176       (appr.~500K). They removed names, but included numerical
   149       times of ratings. Though some information was perturbed
   177       ratings of movies as well as times of ratings. Though
   150       (i.e., slightly modified).
   178       some information was perturbed (i.e., slightly
       
   179       modified).
   151       
   180       
   152       Two researchers took that data and compared it with
   181       Two researchers had a closer look at this anonymised
   153       public data available from the International Movie
   182       data and compared it with public data available from the
   154       Database (IMDb). They found that 98 \% of the entries
   183       International Movie Database (IMDb). They found that 98
   155       could be re-identified: either by their ratings or by
   184       \% of the entries could be re-identified in the Netflix
   156       the dates the ratings were uploaded. 
   185       dataset: either by their ratings or by the dates the
   157 
   186       ratings were uploaded. The result was a class-action 
   158 \item In the 1990, medical databases were routinely made
   187       suit against Netflix, which was only recently resolved
   159       publicised for research purposes. This was done in
   188       involving a lot of money.
   160       anonymised form with names removed, but birth dates,
   189 
   161       gender, ZIP-code were retained.
   190 \item In the 1990ies, medical datasets were often made public for
       
   191       research purposes. This was done in anonymised form with
       
   192       names removed, but birth dates, gender, ZIP-code were
       
   193       retained. In one case where such data was made public
       
   194       about state employees in Massachusetts, the then
       
   195       governor assured the public that the released dataset
       
   196       protected patient privacy by deleting identifiers. A
       
   197       graduate student could not resist and cross-referenced
       
   198       public voter data with the data about birth dates,
       
   199       gender, ZIP-code. The result was that she could send
       
   200       the governor his own hospital record.
       
   201  
       
   202 \item In 2006, AOL published 20 million Web search queries
       
   203       collected of 650,000 users (names had been deleted).
       
   204       This was again for research purposes. However, within
       
   205       days an old lady, Thelma Arnold, from Lilburn, Georgia,
       
   206       (11,596 inhabitants) was identified as user No.~4417749
       
   207       in this dataset. It turned out that search engine
       
   208       queries are windows into people's private lives. 
       
   209   
       
   210 \item Genomic-Wide Association Studies (GWAS) was a public
       
   211       database of gene-frequency studies linked to diseases.
       
   212       you only needed partial DNA information in order to
       
   213       identify whether an individual was part of the study —
       
   214       DB closed in 2008
   162       
   215       
   163 \end{itemize}
   216 \end{itemize}
   164 
   217 
   165 
   218 
   166 \end{document}
   219 \end{document}