FedBRB: An Effective Solution to the Small-to-Large Scenario in Device-Heterogeneity Federated Learning
Ziyue Xu, Mingfeng Xu, Tianchi Liao, Zibin Zheng, Chuan Chen
TL;DR
FedBRB tackles the small-to-large scenario in device-heterogeneity federated learning, where no client can locally train a global model of the size kept on the server. It introduces block-wise Rolling and Block Weighted Broadcast to ensure full parameter-space coverage while accelerating information exchange across sub-models, formalized by the constraint that $\operatorname{dim}_d(\mathbf{W}_i^{t}) < \operatorname{dim}_d(\mathbf{W}_g^{t})$ for some dimension $d$. Empirical results on CIFAR-10/100 and MNIST show substantial gains over state-of-the-art methods (HeteroFL, FedRolex), with even minimal local models sometimes surpassing baselines using larger models, especially under non-IID data. The approach enables resource-constrained institutions to participate in large-scale FL without public data, SVD-heavy decompositions, or compromising privacy, making large-model collaboration more practical in heterogeneous environments.
Abstract
Recently, the success of large models has demonstrated the importance of scaling up model size. This has spurred interest in exploring collaborative training of large-scale models from federated learning perspective. Due to computational constraints, many institutions struggle to train a large-scale model locally. Thus, training a larger global model using only smaller local models has become an important scenario (i.e., the \textbf{small-to-large scenario}). Although recent device-heterogeneity federated learning approaches have started to explore this area, they face limitations in fully covering the parameter space of the global model. In this paper, we propose a method called \textbf{FedBRB} (\underline{B}lock-wise \underline{R}olling and weighted \underline{B}roadcast) based on the block concept. FedBRB can uses small local models to train all blocks of the large global model, and broadcasts the trained parameters to the entire space for faster information interaction. Experiments demonstrate FedBRB yields substantial performance gains, achieving state-of-the-art results in this scenario. Moreover, FedBRB using only minimal local models can even surpass baselines using larger local models.
