Towards Federated Clustering: A Client-wise Private Graph Aggregation Framework
Guanxiong He, Jie Wang, Liaoyuan Tang, Zheng Wang, Rong Wang, Feiping Nie
TL;DR
This paper tackles privacy-preserving federated clustering on decentralized unlabeled data by introducing Structural Privacy-Preserving Federated Graph Clustering (SPP-FGC). The core idea is for each client to build a private structural graph $G^p=(V^p,E^p)$ and for the server to aggregate these into a global graph for robust clustering, using differential privacy to protect model details. A key advancement is the extension SPP-FGC+ that adds self-supervised feature learning (VAE+DEC) to iteratively refine embeddings and improve clustering on complex data, while maintaining privacy via DP-protected prototypes and a Laplacian-rank constrained global graph. The methods achieve up to a 10% improvement in NMI over federated baselines, demonstrate resilience to data heterogeneity, and provide a scalable, communication-efficient solution with provable privacy guarantees. The framework offers practical impact for privacy-sensitive domains by enabling collaborative clustering without exposing raw data or sensitive embeddings.
Abstract
Federated clustering addresses the critical challenge of extracting patterns from decentralized, unlabeled data. However, it is hampered by the flaw that current approaches are forced to accept a compromise between performance and privacy: \textit{transmitting embedding representations risks sensitive data leakage, while sharing only abstract cluster prototypes leads to diminished model accuracy}. To resolve this dilemma, we propose Structural Privacy-Preserving Federated Graph Clustering (SPP-FGC), a novel algorithm that innovatively leverages local structural graphs as the primary medium for privacy-preserving knowledge sharing, thus moving beyond the limitations of conventional techniques. Our framework operates on a clear client-server logic; on the client-side, each participant constructs a private structural graph that captures intrinsic data relationships, which the server then securely aggregates and aligns to form a comprehensive global graph from which a unified clustering structure is derived. The framework offers two distinct modes to suit different needs. SPP-FGC is designed as an efficient one-shot method that completes its task in a single communication round, ideal for rapid analysis. For more complex, unstructured data like images, SPP-FGC+ employs an iterative process where clients and the server collaboratively refine feature representations to achieve superior downstream performance. Extensive experiments demonstrate that our framework achieves state-of-the-art performance, improving clustering accuracy by up to 10\% (NMI) over federated baselines while maintaining provable privacy guarantees.
