Rethinking the Representation in Federated Unsupervised Learning with Non-IID Data
Xinting Liao, Weiming Liu, Chaochao Chen, Pengyang Zhou, Fengyuan Yu, Huabin Zhu, Binhui Yao, Tao Wang, Xiaolin Zheng, Yanchao Tan
TL;DR
This paper tackles federated unsupervised learning with non-IID data by addressing two core issues: representation collapse across local and global models, and misaligned representation spaces among clients. It proposes FedU2, a framework combining a Flexible Uniform Regularizer (FUR) and an Efficient Unified Aggregator (EUA) to enforce uniform and unified representations, respectively. The method uses unbalanced optimal transport to push local representations toward a spherical Gaussian and a multi-objective ADMM-based server aggregation to balance model updates across clients. Experiments on CIFAR10 and CIFAR100 show that FedU2 outperforms baselines in cross-device and cross-silo settings, with ablations confirming the contributions of both FUR and EUA and visualization analyses illustrating improved representation coherence.
Abstract
Federated learning achieves effective performance in modeling decentralized data. In practice, client data are not well-labeled, which makes it potential for federated unsupervised learning (FUSL) with non-IID data. However, the performance of existing FUSL methods suffers from insufficient representations, i.e., (1) representation collapse entanglement among local and global models, and (2) inconsistent representation spaces among local models. The former indicates that representation collapse in local model will subsequently impact the global model and other local models. The latter means that clients model data representation with inconsistent parameters due to the deficiency of supervision signals. In this work, we propose FedU2 which enhances generating uniform and unified representation in FUSL with non-IID data. Specifically, FedU2 consists of flexible uniform regularizer (FUR) and efficient unified aggregator (EUA). FUR in each client avoids representation collapse via dispersing samples uniformly, and EUA in server promotes unified representation by constraining consistent client model updating. To extensively validate the performance of FedU2, we conduct both cross-device and cross-silo evaluation experiments on two benchmark datasets, i.e., CIFAR10 and CIFAR100.
