Table of Contents
Fetching ...

Enhancing One-Shot Federated Learning Through Data and Ensemble Co-Boosting

Rong Dai, Yonggang Zhang, Ang Li, Tongliang Liu, Xun Yang, Bo Han

TL;DR

This paper tackles the bottleneck of one-shot federated learning (OFL) by addressing the dependence of the server model on both the synthesized data and the ensemble of client models. It introduces Co-Boosting, an adversarial, mutually reinforcing framework where hard samples generated from the current ensemble drive reweighting of client logits, and the improved ensemble in turn yields higher-quality data for distillation to the server model $\theta_S$. Key contributions include a hard-sample generation loss $\mathcal{L}_{H}$, an adversarial loss $\mathcal{L}_{A}$, an on-the-fly diversification step for hard samples, a learnable ensemble weighting scheme $\bm{w}$ updated via a gradient-sign method, and a joint max-min objective $\min_{\bm{w}} \max_{\delta \in S} \ell_{CE}(\sum_{k} w_k f_k(x_s+\delta;\bm{\theta}_k), y_s)$. The method is architecture-agnostic and does not require changes to local training or additional data transmissions, making it practical for modern model-market scenarios. Experiments across five datasets and varied non-IID settings demonstrate substantial server accuracy gains over strong baselines, validating the efficacy and robustness of the mutual boosting paradigm.

Abstract

One-shot Federated Learning (OFL) has become a promising learning paradigm, enabling the training of a global server model via a single communication round. In OFL, the server model is aggregated by distilling knowledge from all client models (the ensemble), which are also responsible for synthesizing samples for distillation. In this regard, advanced works show that the performance of the server model is intrinsically related to the quality of the synthesized data and the ensemble model. To promote OFL, we introduce a novel framework, Co-Boosting, in which synthesized data and the ensemble model mutually enhance each other progressively. Specifically, Co-Boosting leverages the current ensemble model to synthesize higher-quality samples in an adversarial manner. These hard samples are then employed to promote the quality of the ensemble model by adjusting the ensembling weights for each client model. Consequently, Co-Boosting periodically achieves high-quality data and ensemble models. Extensive experiments demonstrate that Co-Boosting can substantially outperform existing baselines under various settings. Moreover, Co-Boosting eliminates the need for adjustments to the client's local training, requires no additional data or model transmission, and allows client models to have heterogeneous architectures.

Enhancing One-Shot Federated Learning Through Data and Ensemble Co-Boosting

TL;DR

This paper tackles the bottleneck of one-shot federated learning (OFL) by addressing the dependence of the server model on both the synthesized data and the ensemble of client models. It introduces Co-Boosting, an adversarial, mutually reinforcing framework where hard samples generated from the current ensemble drive reweighting of client logits, and the improved ensemble in turn yields higher-quality data for distillation to the server model . Key contributions include a hard-sample generation loss , an adversarial loss , an on-the-fly diversification step for hard samples, a learnable ensemble weighting scheme updated via a gradient-sign method, and a joint max-min objective . The method is architecture-agnostic and does not require changes to local training or additional data transmissions, making it practical for modern model-market scenarios. Experiments across five datasets and varied non-IID settings demonstrate substantial server accuracy gains over strong baselines, validating the efficacy and robustness of the mutual boosting paradigm.

Abstract

One-shot Federated Learning (OFL) has become a promising learning paradigm, enabling the training of a global server model via a single communication round. In OFL, the server model is aggregated by distilling knowledge from all client models (the ensemble), which are also responsible for synthesizing samples for distillation. In this regard, advanced works show that the performance of the server model is intrinsically related to the quality of the synthesized data and the ensemble model. To promote OFL, we introduce a novel framework, Co-Boosting, in which synthesized data and the ensemble model mutually enhance each other progressively. Specifically, Co-Boosting leverages the current ensemble model to synthesize higher-quality samples in an adversarial manner. These hard samples are then employed to promote the quality of the ensemble model by adjusting the ensembling weights for each client model. Consequently, Co-Boosting periodically achieves high-quality data and ensemble models. Extensive experiments demonstrate that Co-Boosting can substantially outperform existing baselines under various settings. Moreover, Co-Boosting eliminates the need for adjustments to the client's local training, requires no additional data or model transmission, and allows client models to have heterogeneous architectures.
Paper Structure (28 sections, 13 equations, 5 figures, 19 tables, 1 algorithm)

This paper contains 28 sections, 13 equations, 5 figures, 19 tables, 1 algorithm.

Figures (5)

  • Figure 1: Co-Boosting Framework and Experimental Comparison. (a) illustrates the core concept of our approach. In each epoch, high-quality samples are first generated based on last epoch's ensemble and server, which are then used to adjust client weights giving a better ensemble. Based on the enriched data and refined ensemble, server model is updated by distilling knowledge from them. (b) shows test accuracy of ensemble with averaged weights, learned weights on real data, and learned weights on hard samples. (c) shows test accuracy of server obtained through distillation on real, synthetic, and hard samples with same ensemble. (d) presents an overall comparison. DENSE zhang2022dense signifies the current state-of-the-art, FedENS denotes the averaged ensemble. Experiments in (b)(c)(d) are all done on CIFAR-10 with a 10-client $Dir(0.1)$-parted setting.
  • Figure 2: Test accuracy of server
  • Figure 3: Comparison with multi-round federated learning methods.
  • Figure 4: Visualization of synthetic data on MNIST dataset.
  • Figure 5: Visualization of synthetic data on CIFAR-10 dataset.