Heterogeneity-Aware Memory Efficient Federated Learning via Progressive Layer Freezing

Wu Yebo; Li Li; Tian Chunlin; Chang Tao; Lin Chi; Wang Cong; Xu Cheng-Zhong

Heterogeneity-Aware Memory Efficient Federated Learning via Progressive Layer Freezing

Wu Yebo, Li Li, Tian Chunlin, Chang Tao, Lin Chi, Wang Cong, Xu Cheng-Zhong

TL;DR

The paper tackles memory bottlenecks in cross-device federated learning by introducing SmartFreeze, a progressive training framework that freezes model blocks in stages to reduce activation/gradient memory while preserving performance. It combines stage-based memory/time/data models, a pace controller, and a heterogeneity-aware participant selector (with RL-CD for community detection) to orchestrate block-wise training across devices. Key contributions include the block-perturbation–driven freezing criterion, RL-CD for client grouping, and extensive end-to-end and hardware evaluations showing memory reductions up to $82\%$, accuracy gains up to $83.1\%$, and speedups up to $2.02\times$, making large models feasible on memory-constrained devices. The approach offers practical impact by enabling higher-performing FL on edge devices with heterogeneous resources while preserving privacy.

Abstract

In this paper, we propose SmartFreeze, a framework that effectively reduces the memory footprint by conducting the training in a progressive manner. Instead of updating the full model in each training round, SmartFreeze divides the shared model into blocks consisting of a specified number of layers. It first trains the front block with a well-designed output module, safely freezes it after convergence, and then triggers the training of the next one. This process iterates until the whole model has been successfully trained. In this way, the backward computation of the frozen blocks and the corresponding memory space for storing the intermediate outputs and gradients are effectively saved. Except for the progressive training framework, SmartFreeze consists of the following two core components: a pace controller and a participant selector. The pace controller is designed to effectively monitor the training progress of each block at runtime and safely freezes them after convergence while the participant selector selects the right devices to participate in the training for each block by jointly considering the memory capacity, the statistical and system heterogeneity. Extensive experiments are conducted to evaluate the effectiveness of SmartFreeze on both simulation and hardware testbeds. The results demonstrate that SmartFreeze effectively reduces average memory usage by up to 82%. Moreover, it simultaneously improves the model accuracy by up to 83.1% and accelerates the training process up to 2.02X.

Heterogeneity-Aware Memory Efficient Federated Learning via Progressive Layer Freezing

TL;DR

, accuracy gains up to

, and speedups up to

, making large models feasible on memory-constrained devices. The approach offers practical impact by enabling higher-performing FL on edge devices with heterogeneous resources while preserving privacy.

Abstract

Paper Structure (24 sections, 11 equations, 10 figures, 2 tables)

This paper contains 24 sections, 11 equations, 10 figures, 2 tables.

Introduction
Motivation and Observation
SmartFreeze Overview
System Design
Progressive Training with Layer Freezing
Pace Controller
Participant Selector
Stage-Based Memory Model
Stage-Based System Time Model
Stage-Based Data Model
Problem Formulation
Community Detection
Participant Selection
Evaluation
Experiment Setup
...and 9 more sections

Figures (10)

Figure 1: The trade-off between accuracy and training overhead of various system-level memory optimization methods applied in FL. A: vanilla FL, B: local training with a single device, C: gradient checkpointing, D: gradient accumulation, E: model compression (low-precision training, int 8).
Figure 2: CKA of different layers in FL with VGG16.
Figure 3: CKA with different degrees of Non-IID in FL.
Figure 4: Architecture and workflow of SmartFreeze. The Pace Controller and Participant Selector are deployed on the server side. Local Monitor is deployed on each device.
Figure 5: Progressive training with layer freezing. The global model $\Theta$ is split into $[\theta_1, \theta_2,..., \theta_{T}]$ according to the model's structure. In each stage $t$, an additional output module $\theta_{op}$ is attached to $\theta_t$ for training. After training is completed, it will be frozen and proceed to next stage. Iterative, the training part of the model gradually grows until it reaches the target model.
...and 5 more figures

Heterogeneity-Aware Memory Efficient Federated Learning via Progressive Layer Freezing

TL;DR

Abstract

Heterogeneity-Aware Memory Efficient Federated Learning via Progressive Layer Freezing

Authors

TL;DR

Abstract

Table of Contents

Figures (10)