CCFC++: Enhancing Federated Clustering through Feature Decorrelation
Jie Yan, Jing Liu, Yi-Zi Ning, Zhong-Yuan Zhang
TL;DR
This paper tackles data heterogeneity in federated clustering by analyzing how it induces dimensional collapse in CCFC representations. It proposes CCFC++, a decorrelation-regularized extension of CCFC, to reduce interdimensional correlations and mitigate collapse. The authors provide gradient-flow theory linking heterogeneity to low-rank encoder dynamics and validate the approach on MNIST, Fashion-MNIST, CIFAR-10, and STL-10, reporting up to 0.32 gains in $NMI$ and improved robustness to device failures. Overall, the work advances unsupervised federated learning by stabilizing representations under heterogeneity and enhancing clustering performance.
Abstract
In federated clustering, multiple data-holding clients collaboratively group data without exchanging raw data. This field has seen notable advancements through its marriage with contrastive learning, exemplified by Cluster-Contrastive Federated Clustering (CCFC). However, CCFC suffers from heterogeneous data across clients, leading to poor and unrobust performance. Our study conducts both empirical and theoretical analyses to understand the impact of heterogeneous data on CCFC. Findings indicate that increased data heterogeneity exacerbates dimensional collapse in CCFC, evidenced by increased correlations across multiple dimensions of the learned representations. To address this, we introduce a decorrelation regularizer to CCFC. Benefiting from the regularizer, the improved method effectively mitigates the detrimental effects of data heterogeneity, and achieves superior performance, as evidenced by a marked increase in NMI scores, with the gain reaching as high as 0.32 in the most pronounced case.
