Bridging Memory Gaps: Scaling Federated Learning for Heterogeneous Clients

Yebo Wu; Jingguang Li; Chunlin Tian; Kahou Tam; Li Li; Chengzhong Xu

Bridging Memory Gaps: Scaling Federated Learning for Heterogeneous Clients

Yebo Wu, Jingguang Li, Chunlin Tian, Kahou Tam, Li Li, Chengzhong Xu

TL;DR

This work tackles the memory bottleneck in federated learning by introducing ScaleFL, a scalable framework that trains a global model in sequential blocks. It couples a Curriculum Mentor, based on information bottleneck principles and HSIC estimates, with a Training Harmonizer that enables bidirectional information flow across blocks, thereby mitigating information loss and gradient isolation. Empirical results across diverse datasets, device heterogeneity, and even Transformer-based models demonstrate substantial gains in accuracy, memory efficiency, and convergence speed, including non-IID scenarios and large-scale benchmarks. Theoretical convergence guarantees further support the approach, showing that ScaleFL converges to a stationary point under standard smoothness and bounded-gradient assumptions, with the curriculum and co-adaptation components key to stability and performance.

Abstract

Federated Learning (FL) enables multiple clients to collaboratively train a shared model while preserving data privacy. However, the high memory demand during model training severely limits the deployment of FL on resource-constrained clients. To this end, we propose \our, a scalable and inclusive FL framework designed to overcome memory limitations through sequential block-wise training. The core idea of \our is to partition the global model into blocks and train them sequentially, thereby reducing training memory requirements. To mitigate information loss during block-wise training, \our introduces a Curriculum Mentor that crafts curriculum-aware training objectives for each block to steer their learning process. Moreover, \our incorporates a Training Harmonizer that designs a parameter co-adaptation training scheme to coordinate block updates, effectively breaking inter-block information isolation. Extensive experiments on both simulation and hardware testbeds demonstrate that \our significantly improves model performance by up to 84.2\%, reduces peak memory usage by up to 50.4\%, and accelerates training by up to 1.9$\times$.

Bridging Memory Gaps: Scaling Federated Learning for Heterogeneous Clients

TL;DR

Abstract

Paper Structure (25 sections, 1 theorem, 13 equations, 8 figures, 2 tables, 1 algorithm)

This paper contains 25 sections, 1 theorem, 13 equations, 8 figures, 2 tables, 1 algorithm.

Introduction
Background and Motivation
The Sequential Block-Wise Training Paradigm
Limitations of Sequential Block-Wise Training
ScaleFL
Curriculum Mentor
Training Harmonizer
Evaluation
Experimental Setup
Main Results
Generalization Analysis
Hardware Evaluation
Ablation Study
Related Work
Conclusion
...and 10 more sections

Key Result

Theorem 1

Under the smoothness and bounded-gradient assumptions above, let $\{\Theta_{g,t}^r\}$ be the sequence of global models produced by ScaleFL, where each block $\theta_t$ is updated using the curriculum-aware objective in Equation IB_loss2, and aggregated via Equation eq_appendix_aggre with a sufficien where $R$ is the total number of rounds. Consequently, the average gradient norm converges to $0$,

Figures (8)

Figure 1: Workflow of the sequential block-wise training.
Figure 2: Performance comparison of FedAvg, TheoFL, and the sequential block-wise training paradigm (denoted as SBT) on CIFAR10 and CIFAR100.
Figure 3: nHSIC plane dynamics for different blocks of ResNet18 trained on CIFAR10, with the model divided into four blocks. The color gradation shows the training progress, i.e., the number of rounds.
Figure 4: Illustration of the Training Harmonizer, which coordinates block updates through a parameter co-adaptation scheme. The global model is partitioned into four blocks, each composed of two layers.
Figure 5: Performance comparison on large-scale datasets and Transformer-based models.
...and 3 more figures

Theorems & Definitions (2)

Theorem 1
proof

Bridging Memory Gaps: Scaling Federated Learning for Heterogeneous Clients

TL;DR

Abstract

Bridging Memory Gaps: Scaling Federated Learning for Heterogeneous Clients

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (2)