Table of Contents
Fetching ...

Bridging Memory Gaps: Scaling Federated Learning for Heterogeneous Clients

Yebo Wu, Jingguang Li, Chunlin Tian, Kahou Tam, Li Li, Chengzhong Xu

TL;DR

This work tackles the memory bottleneck in federated learning by introducing ScaleFL, a scalable framework that trains a global model in sequential blocks. It couples a Curriculum Mentor, based on information bottleneck principles and HSIC estimates, with a Training Harmonizer that enables bidirectional information flow across blocks, thereby mitigating information loss and gradient isolation. Empirical results across diverse datasets, device heterogeneity, and even Transformer-based models demonstrate substantial gains in accuracy, memory efficiency, and convergence speed, including non-IID scenarios and large-scale benchmarks. Theoretical convergence guarantees further support the approach, showing that ScaleFL converges to a stationary point under standard smoothness and bounded-gradient assumptions, with the curriculum and co-adaptation components key to stability and performance.

Abstract

Federated Learning (FL) enables multiple clients to collaboratively train a shared model while preserving data privacy. However, the high memory demand during model training severely limits the deployment of FL on resource-constrained clients. To this end, we propose \our, a scalable and inclusive FL framework designed to overcome memory limitations through sequential block-wise training. The core idea of \our is to partition the global model into blocks and train them sequentially, thereby reducing training memory requirements. To mitigate information loss during block-wise training, \our introduces a Curriculum Mentor that crafts curriculum-aware training objectives for each block to steer their learning process. Moreover, \our incorporates a Training Harmonizer that designs a parameter co-adaptation training scheme to coordinate block updates, effectively breaking inter-block information isolation. Extensive experiments on both simulation and hardware testbeds demonstrate that \our significantly improves model performance by up to 84.2\%, reduces peak memory usage by up to 50.4\%, and accelerates training by up to 1.9$\times$.

Bridging Memory Gaps: Scaling Federated Learning for Heterogeneous Clients

TL;DR

This work tackles the memory bottleneck in federated learning by introducing ScaleFL, a scalable framework that trains a global model in sequential blocks. It couples a Curriculum Mentor, based on information bottleneck principles and HSIC estimates, with a Training Harmonizer that enables bidirectional information flow across blocks, thereby mitigating information loss and gradient isolation. Empirical results across diverse datasets, device heterogeneity, and even Transformer-based models demonstrate substantial gains in accuracy, memory efficiency, and convergence speed, including non-IID scenarios and large-scale benchmarks. Theoretical convergence guarantees further support the approach, showing that ScaleFL converges to a stationary point under standard smoothness and bounded-gradient assumptions, with the curriculum and co-adaptation components key to stability and performance.

Abstract

Federated Learning (FL) enables multiple clients to collaboratively train a shared model while preserving data privacy. However, the high memory demand during model training severely limits the deployment of FL on resource-constrained clients. To this end, we propose \our, a scalable and inclusive FL framework designed to overcome memory limitations through sequential block-wise training. The core idea of \our is to partition the global model into blocks and train them sequentially, thereby reducing training memory requirements. To mitigate information loss during block-wise training, \our introduces a Curriculum Mentor that crafts curriculum-aware training objectives for each block to steer their learning process. Moreover, \our incorporates a Training Harmonizer that designs a parameter co-adaptation training scheme to coordinate block updates, effectively breaking inter-block information isolation. Extensive experiments on both simulation and hardware testbeds demonstrate that \our significantly improves model performance by up to 84.2\%, reduces peak memory usage by up to 50.4\%, and accelerates training by up to 1.9.
Paper Structure (25 sections, 1 theorem, 13 equations, 8 figures, 2 tables, 1 algorithm)

This paper contains 25 sections, 1 theorem, 13 equations, 8 figures, 2 tables, 1 algorithm.

Key Result

Theorem 1

Under the smoothness and bounded-gradient assumptions above, let $\{\Theta_{g,t}^r\}$ be the sequence of global models produced by ScaleFL, where each block $\theta_t$ is updated using the curriculum-aware objective in Equation IB_loss2, and aggregated via Equation eq_appendix_aggre with a sufficien where $R$ is the total number of rounds. Consequently, the average gradient norm converges to $0$,

Figures (8)

  • Figure 1: Workflow of the sequential block-wise training.
  • Figure 2: Performance comparison of FedAvg, TheoFL, and the sequential block-wise training paradigm (denoted as SBT) on CIFAR10 and CIFAR100.
  • Figure 3: nHSIC plane dynamics for different blocks of ResNet18 trained on CIFAR10, with the model divided into four blocks. The color gradation shows the training progress, i.e., the number of rounds.
  • Figure 4: Illustration of the Training Harmonizer, which coordinates block updates through a parameter co-adaptation scheme. The global model is partitioned into four blocks, each composed of two layers.
  • Figure 5: Performance comparison on large-scale datasets and Transformer-based models.
  • ...and 3 more figures

Theorems & Definitions (2)

  • Theorem 1
  • proof