Table of Contents
Fetching ...

Smooth Anonymity for Sparse Graphs

Alessandro Epasto, Hossein Esfandiari, Vahab Mirrokni, Andres Munoz Medina

TL;DR

This work proves that any differentially private mechanism that maintains a reasonable similarity with the initial dataset is doomed to have a very weak privacy guarantee, and designs a simple large-scale algorithm that efficiently provides smooth-k-anonymity.

Abstract

When working with user data providing well-defined privacy guarantees is paramount. In this work, we aim to manipulate and share an entire sparse dataset with a third party privately. In fact, differential privacy has emerged as the gold standard of privacy, however, when it comes to sharing sparse datasets, e.g. sparse networks, as one of our main results, we prove that \emph{any} differentially private mechanism that maintains a reasonable similarity with the initial dataset is doomed to have a very weak privacy guarantee. In such situations, we need to look into other privacy notions such as $k$-anonymity. In this work, we consider a variation of $k$-anonymity, which we call smooth-$k$-anonymity, and design simple large-scale algorithms that efficiently provide smooth-$k$-anonymity. We further perform an empirical evaluation to back our theoretical guarantees and show that our algorithm improves the performance in downstream machine learning tasks on anonymized data.

Smooth Anonymity for Sparse Graphs

TL;DR

This work proves that any differentially private mechanism that maintains a reasonable similarity with the initial dataset is doomed to have a very weak privacy guarantee, and designs a simple large-scale algorithm that efficiently provides smooth-k-anonymity.

Abstract

When working with user data providing well-defined privacy guarantees is paramount. In this work, we aim to manipulate and share an entire sparse dataset with a third party privately. In fact, differential privacy has emerged as the gold standard of privacy, however, when it comes to sharing sparse datasets, e.g. sparse networks, as one of our main results, we prove that \emph{any} differentially private mechanism that maintains a reasonable similarity with the initial dataset is doomed to have a very weak privacy guarantee. In such situations, we need to look into other privacy notions such as -anonymity. In this work, we consider a variation of -anonymity, which we call smooth--anonymity, and design simple large-scale algorithms that efficiently provide smooth--anonymity. We further perform an empirical evaluation to back our theoretical guarantees and show that our algorithm improves the performance in downstream machine learning tasks on anonymized data.
Paper Structure (32 sections, 15 theorems, 25 equations, 7 figures, 3 tables, 1 algorithm)

This paper contains 32 sections, 15 theorems, 25 equations, 7 figures, 3 tables, 1 algorithm.

Key Result

theorem 1

Let $G=(U \cup F, E) \in \mathbb{G}$ be a graph, $\delta > 0$. Let $\cM$ be a mechanism that generates a graph according to Algorithm alg:randomized_response with $p = \frac{2}{1 + e^\epsilon}$. Let $Q = |U||F|$ and $C(\delta) = \frac{\log(2/\delta)}{2}$. If $\frac{p}{4}\geq \sqrt{\frac{C(\delta)}{Q where $\lambda := \lambda(G)$.

Figures (7)

  • Figure 1: Depiction of $k$-anonymity with suppression and for $k=4$. (left) Original input graph $G$. (center) $k$-anonymous with suppression graph. Notice the removal of the edges to the first and last feature. (right) graph, we preserve the edges to the first feature and add a new edge to it.
  • Figure 2: The $\epsilon$ necessary for a given Jaccard, as a function of density
  • Figure 3: Mean Jaccard similarity for the various datasets and algorithms.
  • Figure 4: Accuracy in learning task in anonymous data . We also include a baseline of using only the majority label as well as training a model without anonymity and a model that uses node differential privacy with $\epsilon=10$
  • Figure 5: Mean Jaccard similarity vs k for additional datasets.
  • ...and 2 more figures

Theorems & Definitions (29)

  • definition 1
  • definition 2
  • definition 3: Node differential privacy
  • definition 4: $k$-anonymization and $k$-anonymization by suppression
  • definition 5
  • definition 6
  • theorem 1
  • corollary 1
  • theorem 2
  • proof
  • ...and 19 more