Breaking the Memory Wall for Heterogeneous Federated Learning via Progressive Training

Yebo Wu; Li Li; Chengzhong Xu

Breaking the Memory Wall for Heterogeneous Federated Learning via Progressive Training

Yebo Wu, Li Li, Chengzhong Xu

TL;DR

ProFL addresses memory limitations in heterogeneous federated learning by progressively training a block-partitioned global model. It employs two stages—progressive model shrinking to prepare block-specific output modules and initialization, followed by progressive model growing to train blocks front-to-back—with a scalar-based block freezing criterion to ensure convergence. The authors prove convergence under standard optimization assumptions and demonstrate substantial practical benefits: up to $57.4\%$ memory reduction and up to $82.4\%$ accuracy improvements across diverse models and datasets, including ViT and FEMNIST. The work also shows compatibility with existing FL methods and scalability to large-scale settings, offering a practical path for memory-constrained devices to participate in collaborative learning.

Abstract

This paper presents ProFL, a new framework that effectively addresses the memory constraints in FL. Rather than updating the full model during local training, ProFL partitions the model into blocks based on its original architecture and trains each block in a progressive fashion. It first trains the front blocks and safely freezes them after convergence. Training of the next block is then triggered. This process progressively grows the model to be trained until the training of the full model is completed. In this way, the peak memory footprint is effectively reduced for feasible deployment on heterogeneous devices. In order to preserve the feature representation of each block, the training process is divided into two stages: model shrinking and model growing. During the model shrinking stage, we meticulously design corresponding output modules to assist each block in learning the expected feature representation and obtain the initialization model parameters. Subsequently, the obtained output modules and initialization model parameters are utilized in the corresponding model growing stage, which progressively trains the full model. Additionally, a novel metric from the scalar perspective is proposed to assess the learning status of each block, enabling us to securely freeze it after convergence and initiate the training of the next one. Finally, we theoretically prove the convergence of ProFL and conduct extensive experiments on representative models and datasets to evaluate its effectiveness. The results demonstrate that ProFL effectively reduces the peak memory footprint by up to 57.4% and improves model accuracy by up to 82.4%.

Breaking the Memory Wall for Heterogeneous Federated Learning via Progressive Training

TL;DR

memory reduction and up to

accuracy improvements across diverse models and datasets, including ViT and FEMNIST. The work also shows compatibility with existing FL methods and scalability to large-scale settings, offering a practical path for memory-constrained devices to participate in collaborative learning.

Abstract

Paper Structure (17 sections, 5 equations, 6 figures, 3 tables)

This paper contains 17 sections, 5 equations, 6 figures, 3 tables.

Introduction
ProFL
High-Level Ideas
Progressive Training Paradigm
Progressive Model Shrinking
Block Freezing Determination
Convergence Analysis
Experiments
Experimental Settings
Performance Evaluation
Ablation Study
Understanding the Effective Movement
Understanding the Inclusiveness of ProFL
Compatibility and Scalability
Discussion
...and 2 more sections

Figures (6)

Figure 1: The workflow of ProFL. The global model is divided into multiple blocks. Progressive model shrinking is initially performed, followed by progressive model growing. Block freezing determination is employed to evaluate the training status of each block during both stages.
Figure 2: Progressive Model Growing. In each step t, only the corresponding block $\theta_{t}$ and output module $\theta_{op}$ are updated.
Figure 3: Progressive Model Shrinking. Map implies integrating the information learned by the block into a basic layer.
Figure 4: Effective movement serves as a robust indicator reflecting block convergence status in ResNet18. Here, step represents the effective movement of the sub-model at each step, while Acc. represents the testing accuracy of the corresponding round.
Figure 5: Effective movement serves as a robust indicator reflecting block convergence status in ResNet34. Here, step represents the effective movement of the sub-model at each step, while Acc. represents the testing accuracy of the corresponding round.
...and 1 more figures

Breaking the Memory Wall for Heterogeneous Federated Learning via Progressive Training

TL;DR

Abstract

Breaking the Memory Wall for Heterogeneous Federated Learning via Progressive Training

Authors

TL;DR

Abstract

Table of Contents

Figures (6)