Table of Contents
Fetching ...

A Differentially Private Clustering Algorithm for Well-Clustered Graphs

Weiqiang He, Hendrik Fichtenberger, Pan Peng

TL;DR

This work tackles differentially private clustering on well-clustered graphs, framing clusters via inner/outer conductance and a $c$-balanced $(k,\phi_{\mathrm{in}},\phi_{\mathrm{out}})$ model. It introduces an SDP-based DP clustering pipeline that adds Gaussian noise to a structured SDP solution and then uses a spectral embedding with $k$-means to recover a $k$-partition, achieving utility close to non-private baselines. The analysis leverages stability of generalized strongly convex optimization to bound sensitivity, yielding $(\epsilon,\delta)$-DP guarantees and a concrete utility bound that scales with the graph parameters, edge density, and privacy budget. An experimental evaluation on SBM data demonstrates improved privacy-utility trade-offs relative to a DP baseline, and a lower-bound result shows the necessity of the $(\epsilon,\delta)$-DP relaxation for attaining small misclassification errors in pure DP settings.

Abstract

We study differentially private (DP) algorithms for recovering clusters in well-clustered graphs, which are graphs whose vertex set can be partitioned into a small number of sets, each inducing a subgraph of high inner conductance and small outer conductance. Such graphs have widespread application as a benchmark in the theoretical analysis of spectral clustering. We provide an efficient ($ε$,$δ$)-DP algorithm tailored specifically for such graphs. Our algorithm draws inspiration from the recent work of Chen et al., who developed DP algorithms for recovery of stochastic block models in cases where the graph comprises exactly two nearly-balanced clusters. Our algorithm works for well-clustered graphs with $k$ nearly-balanced clusters, and the misclassification ratio almost matches the one of the best-known non-private algorithms. We conduct experimental evaluations on datasets with known ground truth clusters to substantiate the prowess of our algorithm. We also show that any (pure) $ε$-DP algorithm would result in substantial error.

A Differentially Private Clustering Algorithm for Well-Clustered Graphs

TL;DR

This work tackles differentially private clustering on well-clustered graphs, framing clusters via inner/outer conductance and a -balanced model. It introduces an SDP-based DP clustering pipeline that adds Gaussian noise to a structured SDP solution and then uses a spectral embedding with -means to recover a -partition, achieving utility close to non-private baselines. The analysis leverages stability of generalized strongly convex optimization to bound sensitivity, yielding -DP guarantees and a concrete utility bound that scales with the graph parameters, edge density, and privacy budget. An experimental evaluation on SBM data demonstrates improved privacy-utility trade-offs relative to a DP baseline, and a lower-bound result shows the necessity of the -DP relaxation for attaining small misclassification errors in pure DP settings.

Abstract

We study differentially private (DP) algorithms for recovering clusters in well-clustered graphs, which are graphs whose vertex set can be partitioned into a small number of sets, each inducing a subgraph of high inner conductance and small outer conductance. Such graphs have widespread application as a benchmark in the theoretical analysis of spectral clustering. We provide an efficient (,)-DP algorithm tailored specifically for such graphs. Our algorithm draws inspiration from the recent work of Chen et al., who developed DP algorithms for recovery of stochastic block models in cases where the graph comprises exactly two nearly-balanced clusters. Our algorithm works for well-clustered graphs with nearly-balanced clusters, and the misclassification ratio almost matches the one of the best-known non-private algorithms. We conduct experimental evaluations on datasets with known ground truth clusters to substantiate the prowess of our algorithm. We also show that any (pure) -DP algorithm would result in substantial error.
Paper Structure (16 sections, 19 theorems, 31 equations, 1 table, 1 algorithm)

This paper contains 16 sections, 19 theorems, 31 equations, 1 table, 1 algorithm.

Key Result

Theorem 1

Let $G=(V,E)$ be a $c$-balanced $(k,\phi_\text{in},\phi_\text{out})$-clusterable graph with its ground truth partition $\{C_i\}_{i\in [k]}$, where $\frac{\phi_\text{out}}{\phi_\text{in}^2}=O(k^{-4})$. Then, there exists an algorithm that, for any $c,k,\phi_\text{in},\phi_\text{out}$ and graph $G$ wi with probability $1-\exp(-\Omega(n))$, where $\sigma$ is a permutation over $[k]:=\{1,\dots,k\}$. M

Theorems & Definitions (32)

  • Definition 1.1: Well-clustered graph
  • Theorem 1
  • Theorem 2: informal
  • Definition 2.1: Differential privacy
  • Definition 2.2: Sensitivity of a function
  • Lemma 2.3: Gaussian mechanism
  • Lemma 2.4: Concentration of spectral norm of Gaussian matrices
  • Lemma 2.5: Weyl's inequality
  • Lemma 2.6: Davis-Kahan $\sin(\theta)$-Theorem davis1970rotation
  • Lemma 2.6: peng2015partitioning
  • ...and 22 more