Table of Contents
Fetching ...

Towards Federated Clustering: A Client-wise Private Graph Aggregation Framework

Guanxiong He, Jie Wang, Liaoyuan Tang, Zheng Wang, Rong Wang, Feiping Nie

TL;DR

This paper tackles privacy-preserving federated clustering on decentralized unlabeled data by introducing Structural Privacy-Preserving Federated Graph Clustering (SPP-FGC). The core idea is for each client to build a private structural graph $G^p=(V^p,E^p)$ and for the server to aggregate these into a global graph for robust clustering, using differential privacy to protect model details. A key advancement is the extension SPP-FGC+ that adds self-supervised feature learning (VAE+DEC) to iteratively refine embeddings and improve clustering on complex data, while maintaining privacy via DP-protected prototypes and a Laplacian-rank constrained global graph. The methods achieve up to a 10% improvement in NMI over federated baselines, demonstrate resilience to data heterogeneity, and provide a scalable, communication-efficient solution with provable privacy guarantees. The framework offers practical impact for privacy-sensitive domains by enabling collaborative clustering without exposing raw data or sensitive embeddings.

Abstract

Federated clustering addresses the critical challenge of extracting patterns from decentralized, unlabeled data. However, it is hampered by the flaw that current approaches are forced to accept a compromise between performance and privacy: \textit{transmitting embedding representations risks sensitive data leakage, while sharing only abstract cluster prototypes leads to diminished model accuracy}. To resolve this dilemma, we propose Structural Privacy-Preserving Federated Graph Clustering (SPP-FGC), a novel algorithm that innovatively leverages local structural graphs as the primary medium for privacy-preserving knowledge sharing, thus moving beyond the limitations of conventional techniques. Our framework operates on a clear client-server logic; on the client-side, each participant constructs a private structural graph that captures intrinsic data relationships, which the server then securely aggregates and aligns to form a comprehensive global graph from which a unified clustering structure is derived. The framework offers two distinct modes to suit different needs. SPP-FGC is designed as an efficient one-shot method that completes its task in a single communication round, ideal for rapid analysis. For more complex, unstructured data like images, SPP-FGC+ employs an iterative process where clients and the server collaboratively refine feature representations to achieve superior downstream performance. Extensive experiments demonstrate that our framework achieves state-of-the-art performance, improving clustering accuracy by up to 10\% (NMI) over federated baselines while maintaining provable privacy guarantees.

Towards Federated Clustering: A Client-wise Private Graph Aggregation Framework

TL;DR

This paper tackles privacy-preserving federated clustering on decentralized unlabeled data by introducing Structural Privacy-Preserving Federated Graph Clustering (SPP-FGC). The core idea is for each client to build a private structural graph and for the server to aggregate these into a global graph for robust clustering, using differential privacy to protect model details. A key advancement is the extension SPP-FGC+ that adds self-supervised feature learning (VAE+DEC) to iteratively refine embeddings and improve clustering on complex data, while maintaining privacy via DP-protected prototypes and a Laplacian-rank constrained global graph. The methods achieve up to a 10% improvement in NMI over federated baselines, demonstrate resilience to data heterogeneity, and provide a scalable, communication-efficient solution with provable privacy guarantees. The framework offers practical impact for privacy-sensitive domains by enabling collaborative clustering without exposing raw data or sensitive embeddings.

Abstract

Federated clustering addresses the critical challenge of extracting patterns from decentralized, unlabeled data. However, it is hampered by the flaw that current approaches are forced to accept a compromise between performance and privacy: \textit{transmitting embedding representations risks sensitive data leakage, while sharing only abstract cluster prototypes leads to diminished model accuracy}. To resolve this dilemma, we propose Structural Privacy-Preserving Federated Graph Clustering (SPP-FGC), a novel algorithm that innovatively leverages local structural graphs as the primary medium for privacy-preserving knowledge sharing, thus moving beyond the limitations of conventional techniques. Our framework operates on a clear client-server logic; on the client-side, each participant constructs a private structural graph that captures intrinsic data relationships, which the server then securely aggregates and aligns to form a comprehensive global graph from which a unified clustering structure is derived. The framework offers two distinct modes to suit different needs. SPP-FGC is designed as an efficient one-shot method that completes its task in a single communication round, ideal for rapid analysis. For more complex, unstructured data like images, SPP-FGC+ employs an iterative process where clients and the server collaboratively refine feature representations to achieve superior downstream performance. Extensive experiments demonstrate that our framework achieves state-of-the-art performance, improving clustering accuracy by up to 10\% (NMI) over federated baselines while maintaining provable privacy guarantees.

Paper Structure

This paper contains 30 sections, 47 equations, 6 figures, 5 tables, 2 algorithms.

Figures (6)

  • Figure 1: Classic FC paradigms and the proposed Federated Graph Clustering framework. The left shows the paradigms of Model Averaging, Embedding Sharing, and Prototype Aggregation, while the right highlights our graph-based approach.
  • Figure 2: Overview of SPP-FGC+. Clients generate private structural graphs from learned features and upload them to the server. The server fuses these inputs into a global graph, derives new cluster prototypes, and sends them back as feedback.
  • Figure 3: Clustering performance measured by ACC and NMI. The Left: changes over the course of iterations, The Right: changes as the number of clients increases.
  • Figure 4: Visualization of transmitted prototypes from k-FED (left column), PPFC-GAN (middle column), and SPP-FGC (right column). This figure illustrates the enhanced privacy preservation achieved by our proposed SPP-FGC algorithm.
  • Figure 5: Visualization of the learned structural matrices. (a) and (b) show the private graphs from two clients. (c) shows the aggregated global graph. (d) shows the final decision similarity graph.
  • ...and 1 more figures