Heterogeneity-Aware Memory Efficient Federated Learning via Progressive Layer Freezing
Wu Yebo, Li Li, Tian Chunlin, Chang Tao, Lin Chi, Wang Cong, Xu Cheng-Zhong
TL;DR
The paper tackles memory bottlenecks in cross-device federated learning by introducing SmartFreeze, a progressive training framework that freezes model blocks in stages to reduce activation/gradient memory while preserving performance. It combines stage-based memory/time/data models, a pace controller, and a heterogeneity-aware participant selector (with RL-CD for community detection) to orchestrate block-wise training across devices. Key contributions include the block-perturbation–driven freezing criterion, RL-CD for client grouping, and extensive end-to-end and hardware evaluations showing memory reductions up to $82\%$, accuracy gains up to $83.1\%$, and speedups up to $2.02\times$, making large models feasible on memory-constrained devices. The approach offers practical impact by enabling higher-performing FL on edge devices with heterogeneous resources while preserving privacy.
Abstract
In this paper, we propose SmartFreeze, a framework that effectively reduces the memory footprint by conducting the training in a progressive manner. Instead of updating the full model in each training round, SmartFreeze divides the shared model into blocks consisting of a specified number of layers. It first trains the front block with a well-designed output module, safely freezes it after convergence, and then triggers the training of the next one. This process iterates until the whole model has been successfully trained. In this way, the backward computation of the frozen blocks and the corresponding memory space for storing the intermediate outputs and gradients are effectively saved. Except for the progressive training framework, SmartFreeze consists of the following two core components: a pace controller and a participant selector. The pace controller is designed to effectively monitor the training progress of each block at runtime and safely freezes them after convergence while the participant selector selects the right devices to participate in the training for each block by jointly considering the memory capacity, the statistical and system heterogeneity. Extensive experiments are conducted to evaluate the effectiveness of SmartFreeze on both simulation and hardware testbeds. The results demonstrate that SmartFreeze effectively reduces average memory usage by up to 82%. Moreover, it simultaneously improves the model accuracy by up to 83.1% and accelerates the training process up to 2.02X.
