Towards Optimal Customized Architecture for Heterogeneous Federated Learning with Contrastive Cloud-Edge Model Decoupling
Xingyan Chen, Tian Du, Mu Wang, Tiancheng Gu, Yu Zhao, Gang Kou, Changqiao Xu, Dapeng Oliver Wu
TL;DR
The paper tackles non-IID data heterogeneity in federated learning by proposing FedCMD, a cloud-edge framework that decouples models into a shared body $\omega$ and a personalized head $\phi$, and dynamically selects the personalized layer $l^*$ using a Wasserstein-based feature distribution transfer metric. It introduces a two-phase approach: (1) personalized layer selection via a contrastive layer selection mechanism and (2) heterogeneous FL with a weighted aggregation guided by layer similarities, maintaining $\phi_i$ locally while updating $\omega$ across clients. The authors demonstrate, through extensive experiments on ten benchmarks and comparisons to nine baselines, that FedCMD achieves superior accuracy and robustness to non-IID data, with favorable scalability and manageable communication overhead. The work advances personalized layer selection by quantifying cross-client data distribution shifts and provides practical algorithms and complexity analyses, offering meaningful improvements for real-world cloud-edge federated systems.
Abstract
Federated learning, as a promising distributed learning paradigm, enables collaborative training of a global model across multiple network edge clients without the need for central data collecting. However, the heterogeneity of edge data distribution drags the model towards the local minima, which can be distant from the global optimum. Such heterogeneity often leads to slow convergence and substantial communication overhead. To address these issues, we propose a novel federated learning framework called FedCMD, a model decoupling tailored to the Cloud-edge supported federated learning that separates deep neural networks into a body for capturing shared representations in Cloud and a personalized head for migrating data heterogeneity. Our motivation is that, by the deep investigation of the performance of selecting different neural network layers as the personalized head, we found rigidly assigning the last layer as the personalized head in current studies is not always optimal. Instead, it is necessary to dynamically select the personalized layer that maximizes the training performance by taking the representation difference between neighbor layers into account. To find the optimal personalized layer, we utilize the low-dimensional representation of each layer to contrast feature distribution transfer and introduce a Wasserstein-based layer selection method, aimed at identifying the best-match layer for personalization. Additionally, a weighted global aggregation algorithm is proposed based on the selected personalized layer for the practical application of FedCMD. Extensive experiments on ten benchmarks demonstrate the efficiency and superior performance of our solution compared with nine state-of-the-art solutions. All code and results are available at https://github.com/elegy112138/FedCMD.
