Table of Contents
Fetching ...

Topological Federated Clustering via Gravitational Potential Fields under Local Differential Privacy

Yunbo Long, Jiaquan Zhang, Xi Chen, Alexandra Brintrup

TL;DR

This work addresses clustering of distributed data under local differential privacy in federated settings, where one-shot approaches struggle under strong LDP. It introduces Gravitational Federated Clustering (GFC), reframing clustering as a topological persistence problem in a gravitational potential field derived from privatized client centroids encoded as masses. The key contributions are a client-side compactness-aware perturbation mechanism and a server-side topological aggregation that identifies stable global centers via persistent homology, with a theoretical bound tying the privacy budget to centroid error. Empirically, GFC achieves substantial improvements over state-of-the-art one-shot methods, especially for ε < 1, while eliminating the need for iterative communication and offering a new topology-informed perspective for privacy-preserving distributed learning.

Abstract

Clustering non-independent and identically distributed (non-IID) data under local differential privacy (LDP) in federated settings presents a critical challenge: preserving privacy while maintaining accuracy without iterative communication. Existing one-shot methods rely on unstable pairwise centroid distances or neighborhood rankings, degrading severely under strong LDP noise and data heterogeneity. We present Gravitational Federated Clustering (GFC), a novel approach to privacy-preserving federated clustering that overcomes the limitations of distance-based methods under varying LDP. Addressing the critical challenge of clustering non-IID data with diverse privacy guarantees, GFC transforms privatized client centroids into a global gravitational potential field where true cluster centers emerge as topologically persistent singularities. Our framework introduces two key innovations: (1) a client-side compactness-aware perturbation mechanism that encodes local cluster geometry as "mass" values, and (2) a server-side topological aggregation phase that extracts stable centroids through persistent homology analysis of the potential field's superlevel sets. Theoretically, we establish a closed-form bound between the privacy budget $ε$ and centroid estimation error, proving the potential field's Lipschitz smoothing properties exponentially suppress noise in high-density regions. Empirically, GFC outperforms state-of-the-art methods on ten benchmarks, especially under strong LDP constraints ($ε< 1$), while maintaining comparable performance at lower privacy budgets. By reformulating federated clustering as a topological persistence problem in a synthetic physics-inspired space, GFC achieves unprecedented privacy-accuracy trade-offs without iterative communication, providing a new perspective for privacy-preserving distributed learning.

Topological Federated Clustering via Gravitational Potential Fields under Local Differential Privacy

TL;DR

This work addresses clustering of distributed data under local differential privacy in federated settings, where one-shot approaches struggle under strong LDP. It introduces Gravitational Federated Clustering (GFC), reframing clustering as a topological persistence problem in a gravitational potential field derived from privatized client centroids encoded as masses. The key contributions are a client-side compactness-aware perturbation mechanism and a server-side topological aggregation that identifies stable global centers via persistent homology, with a theoretical bound tying the privacy budget to centroid error. Empirically, GFC achieves substantial improvements over state-of-the-art one-shot methods, especially for ε < 1, while eliminating the need for iterative communication and offering a new topology-informed perspective for privacy-preserving distributed learning.

Abstract

Clustering non-independent and identically distributed (non-IID) data under local differential privacy (LDP) in federated settings presents a critical challenge: preserving privacy while maintaining accuracy without iterative communication. Existing one-shot methods rely on unstable pairwise centroid distances or neighborhood rankings, degrading severely under strong LDP noise and data heterogeneity. We present Gravitational Federated Clustering (GFC), a novel approach to privacy-preserving federated clustering that overcomes the limitations of distance-based methods under varying LDP. Addressing the critical challenge of clustering non-IID data with diverse privacy guarantees, GFC transforms privatized client centroids into a global gravitational potential field where true cluster centers emerge as topologically persistent singularities. Our framework introduces two key innovations: (1) a client-side compactness-aware perturbation mechanism that encodes local cluster geometry as "mass" values, and (2) a server-side topological aggregation phase that extracts stable centroids through persistent homology analysis of the potential field's superlevel sets. Theoretically, we establish a closed-form bound between the privacy budget and centroid estimation error, proving the potential field's Lipschitz smoothing properties exponentially suppress noise in high-density regions. Empirically, GFC outperforms state-of-the-art methods on ten benchmarks, especially under strong LDP constraints (), while maintaining comparable performance at lower privacy budgets. By reformulating federated clustering as a topological persistence problem in a synthetic physics-inspired space, GFC achieves unprecedented privacy-accuracy trade-offs without iterative communication, providing a new perspective for privacy-preserving distributed learning.

Paper Structure

This paper contains 37 sections, 11 equations, 8 figures, 6 tables, 5 algorithms.

Figures (8)

  • Figure 1: Gravitational Federated Clustering Pipeline.
  • Figure 2: Examples of Topological Analysis for GFC on MNIST Data Visualized via UMAP Projection
  • Figure 3: Performance comparison of GFC, NN-FC, K-Fed, and MUFC across varying privacy budgets $\varepsilon$ (smaller $\varepsilon$ denotes stronger privacy). Mean ARI scores are shown in (a, c, e, g), while corresponding mean NMI scores are in (b, d, f, h). GFC results are highlighted in red.
  • Figure 4: Impact of $\delta$ for GFC on MNIST Data
  • Figure 5: Topological Analysis for GFC on MNIST Data Visualized under 0.01 privacy budget via UMAP Projection
  • ...and 3 more figures