FedCross: Towards Accurate Federated Learning via Multi-Model Cross-Aggregation

Ming Hu; Peiheng Zhou; Zhihao Yue; Zhiwei Ling; Yihao Huang; Anran Li; Yang Liu; Xiang Lian; Mingsong Chen

FedCross: Towards Accurate Federated Learning via Multi-Model Cross-Aggregation

Ming Hu, Peiheng Zhou, Zhihao Yue, Zhiwei Ling, Yihao Huang, Anran Li, Yang Liu, Xiang Lian, Mingsong Chen

TL;DR

FedCross tackles gradient divergence in non-IID federated learning by introducing a multi-model cross-aggregation framework that trains multiple middleware models per round and fuses them via cross-aggregation. It replaces the single global-model paradigm with a fine-grained, multi-to-multi training scheme, including collaborative-model selection, cross-aggregation with a tunable warm-start parameter $\alpha$, and asynchronous global-model deployment. The paper provides a convergence analysis showing sublinear convergence and proposes acceleration methods (propeller models and dynamic $\alpha$) to speed up training. Extensive experiments on CIFAR-10/100, FEMNIST, Shakespeare, and Sent140 demonstrate significant accuracy improvements over FedAvg and other baselines across IID and non-IID settings, with no additional communication overhead, highlighting FedCross’s practical impact for scalable AIoT FL deployments.

Abstract

As a promising distributed machine learning paradigm, Federated Learning (FL) has attracted increasing attention to deal with data silo problems without compromising user privacy. By adopting the classic one-to-multi training scheme (i.e., FedAvg), where the cloud server dispatches one single global model to multiple involved clients, conventional FL methods can achieve collaborative model training without data sharing. However, since only one global model cannot always accommodate all the incompatible convergence directions of local models, existing FL approaches greatly suffer from inferior classification accuracy. To address this issue, we present an efficient FL framework named FedCross, which uses a novel multi-to-multi FL training scheme based on our proposed multi-model cross-aggregation approach. Unlike traditional FL methods, in each round of FL training, FedCross uses multiple middleware models to conduct weighted fusion individually. Since the middleware models used by FedCross can quickly converge into the same flat valley in terms of loss landscapes, the generated global model can achieve a well-generalization. Experimental results on various well-known datasets show that, compared with state-of-the-art FL methods, FedCross can significantly improve FL accuracy within both IID and non-IID scenarios without causing additional communication overhead.

FedCross: Towards Accurate Federated Learning via Multi-Model Cross-Aggregation

TL;DR

, and asynchronous global-model deployment. The paper provides a convergence analysis showing sublinear convergence and proposes acceleration methods (propeller models and dynamic

) to speed up training. Extensive experiments on CIFAR-10/100, FEMNIST, Shakespeare, and Sent140 demonstrate significant accuracy improvements over FedAvg and other baselines across IID and non-IID settings, with no additional communication overhead, highlighting FedCross’s practical impact for scalable AIoT FL deployments.

Abstract

Paper Structure (38 sections, 3 theorems, 36 equations, 9 figures, 3 tables, 1 algorithm)

This paper contains 38 sections, 3 theorems, 36 equations, 9 figures, 3 tables, 1 algorithm.

Introduction
Preliminaries and Related Work
Preliminaries
Related Work on FL Optimization
Our FedCross Approach
Overview of FedCross
The FedCross Algorithm
Collaborative Model Selection ( CoModelSel)
Cross-Aggregation ( CrossAggr)
Global Model Generation
Convergence Analysis
Notations
Proofs of Key Lemmas
Training Acceleration Methods for FedCross
Experimental Results
...and 23 more sections

Key Result

Lemma 3.4

Let $w_{r}^i= \alpha v_{r}^i + (1-\alpha)v_r^{i^\prime}$, $\alpha\in [0,1]$, and $\overline{w}_r = \sum_{i=1}^N w_{r}^i$. We have where $w^\star$ is the optimal parameters for the global loss function $F(\cdot)$. In other words, $\forall w, F^\star\leq F(w)$, where $F^\star$ denotes $F(w^\star)$.

Figures (9)

Figure 1: A motivating example of FedAvg and FedCross training.
Figure 2: The FedCross Framework.
Figure 3: Data distributions of selected clients with different non-IID settings.
Figure 4: Comparison between loss landscapes of FedAvg and FedCross.
Figure 5: Learning curves of different FL methods on CIFAR-10 dataset.
...and 4 more figures

Theorems & Definitions (3)

Lemma 3.4
Lemma 3.5
Lemma 3.6

FedCross: Towards Accurate Federated Learning via Multi-Model Cross-Aggregation

TL;DR

Abstract

FedCross: Towards Accurate Federated Learning via Multi-Model Cross-Aggregation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (3)