A Differentially Private Clustering Algorithm for Well-Clustered Graphs

Weiqiang He; Hendrik Fichtenberger; Pan Peng

A Differentially Private Clustering Algorithm for Well-Clustered Graphs

Weiqiang He, Hendrik Fichtenberger, Pan Peng

TL;DR

This work tackles differentially private clustering on well-clustered graphs, framing clusters via inner/outer conductance and a $c$-balanced $(k,\phi_{\mathrm{in}},\phi_{\mathrm{out}})$ model. It introduces an SDP-based DP clustering pipeline that adds Gaussian noise to a structured SDP solution and then uses a spectral embedding with $k$-means to recover a $k$-partition, achieving utility close to non-private baselines. The analysis leverages stability of generalized strongly convex optimization to bound sensitivity, yielding $(\epsilon,\delta)$-DP guarantees and a concrete utility bound that scales with the graph parameters, edge density, and privacy budget. An experimental evaluation on SBM data demonstrates improved privacy-utility trade-offs relative to a DP baseline, and a lower-bound result shows the necessity of the $(\epsilon,\delta)$-DP relaxation for attaining small misclassification errors in pure DP settings.

Abstract

We study differentially private (DP) algorithms for recovering clusters in well-clustered graphs, which are graphs whose vertex set can be partitioned into a small number of sets, each inducing a subgraph of high inner conductance and small outer conductance. Such graphs have widespread application as a benchmark in the theoretical analysis of spectral clustering. We provide an efficient ($ε$,$δ$)-DP algorithm tailored specifically for such graphs. Our algorithm draws inspiration from the recent work of Chen et al., who developed DP algorithms for recovery of stochastic block models in cases where the graph comprises exactly two nearly-balanced clusters. Our algorithm works for well-clustered graphs with $k$ nearly-balanced clusters, and the misclassification ratio almost matches the one of the best-known non-private algorithms. We conduct experimental evaluations on datasets with known ground truth clusters to substantiate the prowess of our algorithm. We also show that any (pure) $ε$-DP algorithm would result in substantial error.

A Differentially Private Clustering Algorithm for Well-Clustered Graphs

TL;DR

This work tackles differentially private clustering on well-clustered graphs, framing clusters via inner/outer conductance and a

-balanced

model. It introduces an SDP-based DP clustering pipeline that adds Gaussian noise to a structured SDP solution and then uses a spectral embedding with

-means to recover a

-partition, achieving utility close to non-private baselines. The analysis leverages stability of generalized strongly convex optimization to bound sensitivity, yielding

-DP guarantees and a concrete utility bound that scales with the graph parameters, edge density, and privacy budget. An experimental evaluation on SBM data demonstrates improved privacy-utility trade-offs relative to a DP baseline, and a lower-bound result shows the necessity of the

-DP relaxation for attaining small misclassification errors in pure DP settings.

Abstract

)-DP algorithm tailored specifically for such graphs. Our algorithm draws inspiration from the recent work of Chen et al., who developed DP algorithms for recovery of stochastic block models in cases where the graph comprises exactly two nearly-balanced clusters. Our algorithm works for well-clustered graphs with

nearly-balanced clusters, and the misclassification ratio almost matches the one of the best-known non-private algorithms. We conduct experimental evaluations on datasets with known ground truth clusters to substantiate the prowess of our algorithm. We also show that any (pure)

-DP algorithm would result in substantial error.

Paper Structure (16 sections, 19 theorems, 31 equations, 1 table, 1 algorithm)

This paper contains 16 sections, 19 theorems, 31 equations, 1 table, 1 algorithm.

Introduction
Related work
Preliminaries
Differential Privacy
Useful tools
$k$-means and spectral clustering
$\bm{k}$-means
Stability of generalized strongly convex optimization
Private clustering for well-clustered graphs
The algorithm
Proof of Theorem \ref{['theo:maintheorem']}
Privacy of the algorithm
Utility of the algorithm
Experiments
Lower Bound
...and 1 more sections

Key Result

Theorem 1

Let $G=(V,E)$ be a $c$-balanced $(k,\phi_\text{in},\phi_\text{out})$-clusterable graph with its ground truth partition $\{C_i\}_{i\in [k]}$, where $\frac{\phi_\text{out}}{\phi_\text{in}^2}=O(k^{-4})$. Then, there exists an algorithm that, for any $c,k,\phi_\text{in},\phi_\text{out}$ and graph $G$ wi with probability $1-\exp(-\Omega(n))$, where $\sigma$ is a permutation over $[k]:=\{1,\dots,k\}$. M

Theorems & Definitions (32)

Definition 1.1: Well-clustered graph
Theorem 1
Theorem 2: informal
Definition 2.1: Differential privacy
Definition 2.2: Sensitivity of a function
Lemma 2.3: Gaussian mechanism
Lemma 2.4: Concentration of spectral norm of Gaussian matrices
Lemma 2.5: Weyl's inequality
Lemma 2.6: Davis-Kahan $\sin(\theta)$-Theorem davis1970rotation
Lemma 2.6: peng2015partitioning
...and 22 more

A Differentially Private Clustering Algorithm for Well-Clustered Graphs

TL;DR

Abstract

A Differentially Private Clustering Algorithm for Well-Clustered Graphs

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (32)