FedCross: Towards Accurate Federated Learning via Multi-Model Cross-Aggregation
Ming Hu, Peiheng Zhou, Zhihao Yue, Zhiwei Ling, Yihao Huang, Anran Li, Yang Liu, Xiang Lian, Mingsong Chen
TL;DR
FedCross tackles gradient divergence in non-IID federated learning by introducing a multi-model cross-aggregation framework that trains multiple middleware models per round and fuses them via cross-aggregation. It replaces the single global-model paradigm with a fine-grained, multi-to-multi training scheme, including collaborative-model selection, cross-aggregation with a tunable warm-start parameter $\alpha$, and asynchronous global-model deployment. The paper provides a convergence analysis showing sublinear convergence and proposes acceleration methods (propeller models and dynamic $\alpha$) to speed up training. Extensive experiments on CIFAR-10/100, FEMNIST, Shakespeare, and Sent140 demonstrate significant accuracy improvements over FedAvg and other baselines across IID and non-IID settings, with no additional communication overhead, highlighting FedCross’s practical impact for scalable AIoT FL deployments.
Abstract
As a promising distributed machine learning paradigm, Federated Learning (FL) has attracted increasing attention to deal with data silo problems without compromising user privacy. By adopting the classic one-to-multi training scheme (i.e., FedAvg), where the cloud server dispatches one single global model to multiple involved clients, conventional FL methods can achieve collaborative model training without data sharing. However, since only one global model cannot always accommodate all the incompatible convergence directions of local models, existing FL approaches greatly suffer from inferior classification accuracy. To address this issue, we present an efficient FL framework named FedCross, which uses a novel multi-to-multi FL training scheme based on our proposed multi-model cross-aggregation approach. Unlike traditional FL methods, in each round of FL training, FedCross uses multiple middleware models to conduct weighted fusion individually. Since the middleware models used by FedCross can quickly converge into the same flat valley in terms of loss landscapes, the generated global model can achieve a well-generalization. Experimental results on various well-known datasets show that, compared with state-of-the-art FL methods, FedCross can significantly improve FL accuracy within both IID and non-IID scenarios without causing additional communication overhead.
