FedBRB: An Effective Solution to the Small-to-Large Scenario in Device-Heterogeneity Federated Learning

Ziyue Xu; Mingfeng Xu; Tianchi Liao; Zibin Zheng; Chuan Chen

FedBRB: An Effective Solution to the Small-to-Large Scenario in Device-Heterogeneity Federated Learning

Ziyue Xu, Mingfeng Xu, Tianchi Liao, Zibin Zheng, Chuan Chen

TL;DR

FedBRB tackles the small-to-large scenario in device-heterogeneity federated learning, where no client can locally train a global model of the size kept on the server. It introduces block-wise Rolling and Block Weighted Broadcast to ensure full parameter-space coverage while accelerating information exchange across sub-models, formalized by the constraint that $\operatorname{dim}_d(\mathbf{W}_i^{t}) < \operatorname{dim}_d(\mathbf{W}_g^{t})$ for some dimension $d$. Empirical results on CIFAR-10/100 and MNIST show substantial gains over state-of-the-art methods (HeteroFL, FedRolex), with even minimal local models sometimes surpassing baselines using larger models, especially under non-IID data. The approach enables resource-constrained institutions to participate in large-scale FL without public data, SVD-heavy decompositions, or compromising privacy, making large-model collaboration more practical in heterogeneous environments.

Abstract

Recently, the success of large models has demonstrated the importance of scaling up model size. This has spurred interest in exploring collaborative training of large-scale models from federated learning perspective. Due to computational constraints, many institutions struggle to train a large-scale model locally. Thus, training a larger global model using only smaller local models has become an important scenario (i.e., the \textbf{small-to-large scenario}). Although recent device-heterogeneity federated learning approaches have started to explore this area, they face limitations in fully covering the parameter space of the global model. In this paper, we propose a method called \textbf{FedBRB} (\underline{B}lock-wise \underline{R}olling and weighted \underline{B}roadcast) based on the block concept. FedBRB can uses small local models to train all blocks of the large global model, and broadcasts the trained parameters to the entire space for faster information interaction. Experiments demonstrate FedBRB yields substantial performance gains, achieving state-of-the-art results in this scenario. Moreover, FedBRB using only minimal local models can even surpass baselines using larger local models.

FedBRB: An Effective Solution to the Small-to-Large Scenario in Device-Heterogeneity Federated Learning

TL;DR

for some dimension

. Empirical results on CIFAR-10/100 and MNIST show substantial gains over state-of-the-art methods (HeteroFL, FedRolex), with even minimal local models sometimes surpassing baselines using larger models, especially under non-IID data. The approach enables resource-constrained institutions to participate in large-scale FL without public data, SVD-heavy decompositions, or compromising privacy, making large-model collaboration more practical in heterogeneous environments.

Abstract

Paper Structure (34 sections, 4 equations, 6 figures, 11 tables, 1 algorithm)

This paper contains 34 sections, 4 equations, 6 figures, 11 tables, 1 algorithm.

Introduction
Related Work
Heterogeneity Federated Learning.
Knowledge Distillation.
Tensor Decomposition.
Model Partitioning.
Method
Notation.
Model Partitioning
$\alpha$. Neuron Perspective
$\beta$. Parameter Tensor Perspective
The Proposed FedBRB
Block-wise Partitioning and Rolling.
Block Weighted Broadcast.
Comparing to FedRolex.
...and 19 more sections

Figures (6)

Figure 1: An illustration of the small-to-large scenario, where the global model is larger than any client's local model. Squares of different sizes represent models of different sizes, where the fraction represents the model size.
Figure 2: Model partitioning in neuron perspective.
Figure 3: (a) Model partitioning illustration in parameter tensor perspective. (b) Fixed partitioning illustration where the trained area remains fixed over rounds. (c) Rolling partitioning illustration where the trained area changes over rounds. (d) An example showing that baseline FedRolex trains a $1/4$ sub-model on a 512$\times$512 channel tensor, with a large portion of the parameters remain untrained (black).
Figure 4: Overview of the FedBRB. (a) The global model is partitioned into 16 blocks based on the size of the smallest client B. Client B uses 1 block as its sub-model, while client A combines 4 blocks as its sub-model. (b) Clients upload their updates. (c) Weighted broadcasting shares the updated blocks to all other places to further increases the frequency of information interaction between sub-models. (d) After server aggregation, the partitioning place of blocks will roll to the next index, enabling FedBRB to train all parameters of the global model over successive rounds.
Figure 5: (Q2) Performance of $e$-level full model size training (blue) and FedBRB small-to-large scenario (red) on CIFAR-10 under dynamic setting. As the value of x in a0-$\underline{x}$1-e1 increases from e to b, the performance of FedBRB gradually surpasses the blue. This demonstrates the vital role of relatively large local models in FedBRB.
...and 1 more figures

Theorems & Definitions (2)

Definition 1
Definition 2

FedBRB: An Effective Solution to the Small-to-Large Scenario in Device-Heterogeneity Federated Learning

TL;DR

Abstract

FedBRB: An Effective Solution to the Small-to-Large Scenario in Device-Heterogeneity Federated Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (6)

Theorems & Definitions (2)