Table of Contents
Fetching ...

Asynchronous Federated Clustering with Unknown Number of Clusters

Yunfan Zhang, Yiqun Zhang, Yang Lu, Mengke Li, Xi Chen, Yiu-ming Cheung

TL;DR

The paper tackles privacy-preserving federated clustering on non-IID data with asynchronous client participation and an unknown global cluster count $k^*$. It introduces Asynchronous Federated Cluster Learning (AFCL), a seed-based framework that uses Client-Side Update Accumulation (CSUA) and Server-Side Seeds Interaction (SSSI) to fuse distributed cluster information without predefining $k^*$. A balancing mechanism mitigates bias from uneven client participation, enabling seeds to converge to a meaningful global clustering structure. Experiments on 13 datasets show AFCL outperforms or matches baselines under asynchronous, non-IID settings, demonstrating robustness and practical applicability of seed-based consensus for FC.

Abstract

Federated Clustering (FC) is crucial to mining knowledge from unlabeled non-Independent Identically Distributed (non-IID) data provided by multiple clients while preserving their privacy. Most existing attempts learn cluster distributions at local clients, and then securely pass the desensitized information to the server for aggregation. However, some tricky but common FC problems are still relatively unexplored, including the heterogeneity in terms of clients' communication capacity and the unknown number of proper clusters $k^*$. To further bridge the gap between FC and real application scenarios, this paper first shows that the clients' communication asynchrony and unknown $k^*$ are complex coupling problems, and then proposes an Asynchronous Federated Cluster Learning (AFCL) method accordingly. It spreads the excessive number of seed points to the clients as a learning medium and coordinates them across the clients to form a consensus. To alleviate the distribution imbalance cumulated due to the unforeseen asynchronous uploading from the heterogeneous clients, we also design a balancing mechanism for seeds updating. As a result, the seeds gradually adapt to each other to reveal a proper number of clusters. Extensive experiments demonstrate the efficacy of AFCL.

Asynchronous Federated Clustering with Unknown Number of Clusters

TL;DR

The paper tackles privacy-preserving federated clustering on non-IID data with asynchronous client participation and an unknown global cluster count . It introduces Asynchronous Federated Cluster Learning (AFCL), a seed-based framework that uses Client-Side Update Accumulation (CSUA) and Server-Side Seeds Interaction (SSSI) to fuse distributed cluster information without predefining . A balancing mechanism mitigates bias from uneven client participation, enabling seeds to converge to a meaningful global clustering structure. Experiments on 13 datasets show AFCL outperforms or matches baselines under asynchronous, non-IID settings, demonstrating robustness and practical applicability of seed-based consensus for FC.

Abstract

Federated Clustering (FC) is crucial to mining knowledge from unlabeled non-Independent Identically Distributed (non-IID) data provided by multiple clients while preserving their privacy. Most existing attempts learn cluster distributions at local clients, and then securely pass the desensitized information to the server for aggregation. However, some tricky but common FC problems are still relatively unexplored, including the heterogeneity in terms of clients' communication capacity and the unknown number of proper clusters . To further bridge the gap between FC and real application scenarios, this paper first shows that the clients' communication asynchrony and unknown are complex coupling problems, and then proposes an Asynchronous Federated Cluster Learning (AFCL) method accordingly. It spreads the excessive number of seed points to the clients as a learning medium and coordinates them across the clients to form a consensus. To alleviate the distribution imbalance cumulated due to the unforeseen asynchronous uploading from the heterogeneous clients, we also design a balancing mechanism for seeds updating. As a result, the seeds gradually adapt to each other to reveal a proper number of clusters. Extensive experiments demonstrate the efficacy of AFCL.
Paper Structure (15 sections, 14 equations, 4 figures, 4 tables, 1 algorithm)

This paper contains 15 sections, 14 equations, 4 figures, 4 tables, 1 algorithm.

Figures (4)

  • Figure 1: AFCL (ours) vs. typical FC. (a) Existing FC approaches typically assume that the clients are synchronous and the optimal $k^*$ is known by both the clients and server. By contrast, (b) AFCL learns under a more realistic scenario that the clients can upload distribution information of completely non-overlapping and non-uniform numbers of clusters with unforeseen and imbalanced frequencies.
  • Figure 2: Overview of the proposed AFCL framework. Initialized seed points accumulate update intensity from different clients independently, then the server balances the update information to facilitate appropriate seeds interaction for fusing the clients' distributions. The heat map represents the intensity of update information of seeds accumulated from asynchronous clients.
  • Figure 3: Seed points and their trajectories on the server during the learning of AFCL. Black and red dots indicate the initial and final positions of the seed points, respectively.
  • Figure 4: Values of the AFCL objective function on (a) SD1, (b) IR, (c) SE, and (d) AL datasets. Red triangles mark the iterations that the server update starts.

Theorems & Definitions (3)

  • Remark 1
  • Remark 2
  • Remark 3