Jigsaw Game: Federated Clustering
Jinxuan Xu, Hong-You Chen, Wei-Lun Chao, Yuqian Zhang
TL;DR
The paper tackles federated clustering for unlabeled data by formulating a federated $k$-means objective $G(\mathcal{C})=\sum_m G_m(\mathcal{C})$ and introducing FeCA, a one-shot method that refines each client’s local centroids and aggregates them at the server to recover the global centroids $\mathcal{C}^*$. It exploits the structured nature of local solutions (one-to-many and many-to-one associations) under separation conditions, aided by a RadiusAssign/ServerAggregation pipeline and theoretical guarantees under the Stochastic Ball Model. The authors extend FeCA to DeepFeCA for federated unsupervised representation learning by iterating with DeepCluster-inspired pseudo-labeling, yielding competitive results on CIFAR and Tiny-ImageNet in federated settings. Empirical results across synthetic and real datasets show FeCA’s robustness to non-IID data and its ability to recover global centroids in a single round, often surpassing centralized baselines due to leveraging diverse local solutions. Overall, the approach offers a strong, communication-efficient framework for federated unsupervised learning with practical impact on privacy-preserving clustering and representation learning.
Abstract
Federated learning has recently garnered significant attention, especially within the domain of supervised learning. However, despite the abundance of unlabeled data on end-users, unsupervised learning problems such as clustering in the federated setting remain underexplored. In this paper, we investigate the federated clustering problem, with a focus on federated k-means. We outline the challenge posed by its non-convex objective and data heterogeneity in the federated framework. To tackle these challenges, we adopt a new perspective by studying the structures of local solutions in k-means and propose a one-shot algorithm called FeCA (Federated Centroid Aggregation). FeCA adaptively refines local solutions on clients, then aggregates these refined solutions to recover the global solution of the entire dataset in a single round. We empirically demonstrate the robustness of FeCA under various federated scenarios on both synthetic and real-world data. Additionally, we extend FeCA to representation learning and present DeepFeCA, which combines DeepCluster and FeCA for unsupervised feature learning in the federated setting.
