Jigsaw Game: Federated Clustering

Jinxuan Xu; Hong-You Chen; Wei-Lun Chao; Yuqian Zhang

Jigsaw Game: Federated Clustering

Jinxuan Xu, Hong-You Chen, Wei-Lun Chao, Yuqian Zhang

TL;DR

The paper tackles federated clustering for unlabeled data by formulating a federated $k$-means objective $G(\mathcal{C})=\sum_m G_m(\mathcal{C})$ and introducing FeCA, a one-shot method that refines each client’s local centroids and aggregates them at the server to recover the global centroids $\mathcal{C}^*$. It exploits the structured nature of local solutions (one-to-many and many-to-one associations) under separation conditions, aided by a RadiusAssign/ServerAggregation pipeline and theoretical guarantees under the Stochastic Ball Model. The authors extend FeCA to DeepFeCA for federated unsupervised representation learning by iterating with DeepCluster-inspired pseudo-labeling, yielding competitive results on CIFAR and Tiny-ImageNet in federated settings. Empirical results across synthetic and real datasets show FeCA’s robustness to non-IID data and its ability to recover global centroids in a single round, often surpassing centralized baselines due to leveraging diverse local solutions. Overall, the approach offers a strong, communication-efficient framework for federated unsupervised learning with practical impact on privacy-preserving clustering and representation learning.

Abstract

Federated learning has recently garnered significant attention, especially within the domain of supervised learning. However, despite the abundance of unlabeled data on end-users, unsupervised learning problems such as clustering in the federated setting remain underexplored. In this paper, we investigate the federated clustering problem, with a focus on federated k-means. We outline the challenge posed by its non-convex objective and data heterogeneity in the federated framework. To tackle these challenges, we adopt a new perspective by studying the structures of local solutions in k-means and propose a one-shot algorithm called FeCA (Federated Centroid Aggregation). FeCA adaptively refines local solutions on clients, then aggregates these refined solutions to recover the global solution of the entire dataset in a single round. We empirically demonstrate the robustness of FeCA under various federated scenarios on both synthetic and real-world data. Additionally, we extend FeCA to representation learning and present DeepFeCA, which combines DeepCluster and FeCA for unsupervised feature learning in the federated setting.

Jigsaw Game: Federated Clustering

TL;DR

The paper tackles federated clustering for unlabeled data by formulating a federated

-means objective

and introducing FeCA, a one-shot method that refines each client’s local centroids and aggregates them at the server to recover the global centroids

. It exploits the structured nature of local solutions (one-to-many and many-to-one associations) under separation conditions, aided by a RadiusAssign/ServerAggregation pipeline and theoretical guarantees under the Stochastic Ball Model. The authors extend FeCA to DeepFeCA for federated unsupervised representation learning by iterating with DeepCluster-inspired pseudo-labeling, yielding competitive results on CIFAR and Tiny-ImageNet in federated settings. Empirical results across synthetic and real datasets show FeCA’s robustness to non-IID data and its ability to recover global centroids in a single round, often surpassing centralized baselines due to leveraging diverse local solutions. Overall, the approach offers a strong, communication-efficient framework for federated unsupervised learning with practical impact on privacy-preserving clustering and representation learning.

Abstract

Paper Structure (34 sections, 4 theorems, 49 equations, 18 figures, 10 tables, 6 algorithms)

This paper contains 34 sections, 4 theorems, 49 equations, 18 figures, 10 tables, 6 algorithms.

Introduction
Related Work
Background
Structure of Local Solutions
Jigsaw Game -- FeCA
Privacy concern.
Client Update Algorithm
Radius Assign Algorithm
Server Aggregation Algorithm
Theoretical Analysis
Discussions on Heterogeneity
DeepFeCA
Experiments
FeCA Evaluation
On synthetic datasets.
...and 19 more sections

Key Result

Theorem 4.1

(Main Theorem) Under the Stochastic Ball Model, for some constants $\lambda\geq 3$ and $\eta \geq 5$, if then by utilizing the radius determined by Algorithm alg:FeCA-RadiusAssign2, any output centroid $c_s^*$ from Algorithm alg:FeCA is close to some ground truth center:

Figures (18)

Figure 1: Clustering results. (Left): global and local solutions on centralized/IID client's data; (Right): global solutions for non-IID client's data sharing similar structures.
Figure 2: FeCA roadmap. 1st column: The centralized dataset distributed to clients. 2nd column: The $k$-means clustering results on different clients under non-IID data sample scenario, where black triangles and squares represent centroids. 3rd column: Eliminating one-fit-many centroids in Algorithm \ref{['alg:FeCA-ClientUpdate']}, indicated by hollow squares and triangles. 4th column: Centroids sent to the server. 5th column: Aggregation of received centroids on the server where red crosses represent recovered centroids.
Figure 3: Illustrations of $\ell_2$-distance results in \ref{['exp:synthetic_l2']} with 10 random seeds.
Figure 4: Visualizations of S-sets (S1&S4) and recovered centroids by different methods. Results are showcased under the Dirichlet($0.3$) data sample scenario. Blue dots represent recovered centroids, and red crosses indicate the ground truth centers.
Figure 5: Evaluation of $\sigma$ on S-sets (S1) across three data sample scenarios.$\sigma_i$ for $k$ clusters is represented in different colors. The values of $\sigma_i$ for all returned centroids $c_i$ are reported over $3$ random runs, with the red star marking the maximum $\sigma_i$ observed in three runs. A $\sigma_i$ value below $0.5$ indicates that, the server effectively groups centroid $c_i$ utilizing the radius $r_s$ assigned by Algorithm \ref{['alg:FeCA-RadiusAssign2-empirical']}.
...and 13 more figures

Theorems & Definitions (9)

Theorem 4.1
Remark 6.1
Lemma A.1
proof
Lemma A.2
proof
Lemma A.3
proof
proof

Jigsaw Game: Federated Clustering

TL;DR

Abstract

Jigsaw Game: Federated Clustering

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (18)

Theorems & Definitions (9)