Smooth Anonymity for Sparse Graphs

Alessandro Epasto; Hossein Esfandiari; Vahab Mirrokni; Andres Munoz Medina

Smooth Anonymity for Sparse Graphs

Alessandro Epasto, Hossein Esfandiari, Vahab Mirrokni, Andres Munoz Medina

TL;DR

This work proves that any differentially private mechanism that maintains a reasonable similarity with the initial dataset is doomed to have a very weak privacy guarantee, and designs a simple large-scale algorithm that efficiently provides smooth-k-anonymity.

Abstract

When working with user data providing well-defined privacy guarantees is paramount. In this work, we aim to manipulate and share an entire sparse dataset with a third party privately. In fact, differential privacy has emerged as the gold standard of privacy, however, when it comes to sharing sparse datasets, e.g. sparse networks, as one of our main results, we prove that \emph{any} differentially private mechanism that maintains a reasonable similarity with the initial dataset is doomed to have a very weak privacy guarantee. In such situations, we need to look into other privacy notions such as $k$-anonymity. In this work, we consider a variation of $k$-anonymity, which we call smooth-$k$-anonymity, and design simple large-scale algorithms that efficiently provide smooth-$k$-anonymity. We further perform an empirical evaluation to back our theoretical guarantees and show that our algorithm improves the performance in downstream machine learning tasks on anonymized data.

Smooth Anonymity for Sparse Graphs

TL;DR

Abstract

-anonymity. In this work, we consider a variation of

-anonymity, which we call smooth-

-anonymity, and design simple large-scale algorithms that efficiently provide smooth-

-anonymity. We further perform an empirical evaluation to back our theoretical guarantees and show that our algorithm improves the performance in downstream machine learning tasks on anonymized data.

Paper Structure (32 sections, 15 theorems, 25 equations, 7 figures, 3 tables, 1 algorithm)

This paper contains 32 sections, 15 theorems, 25 equations, 7 figures, 3 tables, 1 algorithm.

Introduction
Related work
Setup
Comparison of privacy notions
Comparison with differential privacy
Comparison with $k$-anonymity by suppression
Algorithms and Analysis
Preliminaries
Initial algorithm
Improved algorithm
Experimental results
Conclusion
Acknowledgement
Technical lemma
Omitted proofs from Section \ref{['sec:dp-comparison']}
...and 17 more sections

Key Result

theorem 1

Let $G=(U \cup F, E) \in \mathbb{G}$ be a graph, $\delta > 0$. Let $\cM$ be a mechanism that generates a graph according to Algorithm alg:randomized_response with $p = \frac{2}{1 + e^\epsilon}$. Let $Q = |U||F|$ and $C(\delta) = \frac{\log(2/\delta)}{2}$. If $\frac{p}{4}\geq \sqrt{\frac{C(\delta)}{Q where $\lambda := \lambda(G)$.

Figures (7)

Figure 1: Depiction of $k$-anonymity with suppression and for $k=4$. (left) Original input graph $G$. (center) $k$-anonymous with suppression graph. Notice the removal of the edges to the first and last feature. (right) graph, we preserve the edges to the first feature and add a new edge to it.
Figure 2: The $\epsilon$ necessary for a given Jaccard, as a function of density
Figure 3: Mean Jaccard similarity for the various datasets and algorithms.
Figure 4: Accuracy in learning task in anonymous data . We also include a baseline of using only the majority label as well as training a model without anonymity and a model that uses node differential privacy with $\epsilon=10$
Figure 5: Mean Jaccard similarity vs k for additional datasets.
...and 2 more figures

Theorems & Definitions (29)

definition 1
definition 2
definition 3: Node differential privacy
definition 4: $k$-anonymization and $k$-anonymization by suppression
definition 5
definition 6
theorem 1
corollary 1
theorem 2
proof
...and 19 more

Smooth Anonymity for Sparse Graphs

TL;DR

Abstract

Smooth Anonymity for Sparse Graphs

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (29)