A Differentially Private Clustering Algorithm for Well-Clustered Graphs
Weiqiang He, Hendrik Fichtenberger, Pan Peng
TL;DR
This work tackles differentially private clustering on well-clustered graphs, framing clusters via inner/outer conductance and a $c$-balanced $(k,\phi_{\mathrm{in}},\phi_{\mathrm{out}})$ model. It introduces an SDP-based DP clustering pipeline that adds Gaussian noise to a structured SDP solution and then uses a spectral embedding with $k$-means to recover a $k$-partition, achieving utility close to non-private baselines. The analysis leverages stability of generalized strongly convex optimization to bound sensitivity, yielding $(\epsilon,\delta)$-DP guarantees and a concrete utility bound that scales with the graph parameters, edge density, and privacy budget. An experimental evaluation on SBM data demonstrates improved privacy-utility trade-offs relative to a DP baseline, and a lower-bound result shows the necessity of the $(\epsilon,\delta)$-DP relaxation for attaining small misclassification errors in pure DP settings.
Abstract
We study differentially private (DP) algorithms for recovering clusters in well-clustered graphs, which are graphs whose vertex set can be partitioned into a small number of sets, each inducing a subgraph of high inner conductance and small outer conductance. Such graphs have widespread application as a benchmark in the theoretical analysis of spectral clustering. We provide an efficient ($ε$,$δ$)-DP algorithm tailored specifically for such graphs. Our algorithm draws inspiration from the recent work of Chen et al., who developed DP algorithms for recovery of stochastic block models in cases where the graph comprises exactly two nearly-balanced clusters. Our algorithm works for well-clustered graphs with $k$ nearly-balanced clusters, and the misclassification ratio almost matches the one of the best-known non-private algorithms. We conduct experimental evaluations on datasets with known ground truth clusters to substantiate the prowess of our algorithm. We also show that any (pure) $ε$-DP algorithm would result in substantial error.
