FedMef: Towards Memory-efficient Federated Dynamic Pruning

Hong Huang; Weiming Zhuang; Chen Chen; Lingjuan Lyu

FedMef: Towards Memory-efficient Federated Dynamic Pruning

Hong Huang, Weiming Zhuang, Chen Chen, Lingjuan Lyu

TL;DR

FedMef tackles memory bottlenecks in federated dynamic pruning for cross-device FL by introducing two mechanisms: budget-aware extrusion (BaE) to transfer information from pruned parameters within a budget, and scaled activation pruning (SAP) to dramatically reduce activation memory. SAP uses Normalized Sparse Convolution (NSConv) to center activations around zero and enable effective pruning with BN-free training, particularly under small batch sizes. BaE mitigates post-pruning accuracy loss by regularizing low-magnitude weights during extrusion and coupling pruning with growth, enabling a specialized sparse model that maintains accuracy while lowering memory and computational demands. Across CIFAR-10, CINIC-10, and TinyImageNet with ResNet18 and MobileNetV2, FedMef achieves higher accuracy and up to 28.5% memory savings compared to state-of-the-art federated pruning baselines, demonstrating practical impact for memory-constrained edge devices.

Abstract

Federated learning (FL) promotes decentralized training while prioritizing data confidentiality. However, its application on resource-constrained devices is challenging due to the high demand for computation and memory resources to train deep learning models. Neural network pruning techniques, such as dynamic pruning, could enhance model efficiency, but directly adopting them in FL still poses substantial challenges, including post-pruning performance degradation, high activation memory usage, etc. To address these challenges, we propose FedMef, a novel and memory-efficient federated dynamic pruning framework. FedMef comprises two key components. First, we introduce the budget-aware extrusion that maintains pruning efficiency while preserving post-pruning performance by salvaging crucial information from parameters marked for pruning within a given budget. Second, we propose scaled activation pruning to effectively reduce activation memory footprints, which is particularly beneficial for deploying FL to memory-limited devices. Extensive experiments demonstrate the effectiveness of our proposed FedMef. In particular, it achieves a significant reduction of 28.5% in memory footprint compared to state-of-the-art methods while obtaining superior accuracy.

FedMef: Towards Memory-efficient Federated Dynamic Pruning

TL;DR

Abstract

Paper Structure (33 sections, 1 theorem, 17 equations, 9 figures, 5 tables)

This paper contains 33 sections, 1 theorem, 17 equations, 9 figures, 5 tables.

Introduction
Related Work
Neural Network Pruning
Federated Neural Network Pruning
Activation Cache Compression
Methodology
Problem Setup
Design Principles
Budget-aware Extrusion
Scaled Activation Pruning
Evaluation
Experimental Setup
Performance Evaluation
Ablation Study
Conclusion
...and 18 more sections

Key Result

Theorem 1

Given a CNN model structured in a ReLU-Conv sequence and $l$-th convolution layer performing operations as depicted by the forward pass in Equation eq:fp and NSConv in Equation eq:WSSConv, for the $i$-th channel of the activation value, $f(a^{l-1}_i)$, with its mean and variance denoted as $\mu_f, \

Figures (9)

Figure 1: Overview of FedMef for the memory-efficient dynamic pruning in federated learning. FedMef proposes budget-aware extrusion (BaE) to preserve post-pruning accuracy by transferring essential information from low-magnitude parameters to the others, making them close to 0, and introducing scaled activation pruning (SAP) to reduce memory usage. In FedMef, the server distributes a randomly pruned model to devices for collaborative training with SAP. After multiple training rounds, devices employ BaE for information transfer. The server adjusts the model structure through magnitude pruning and growing. The newly activated parameters are initialized as 0.
Figure 2: The illustration of training pipeline in baseline and the proposed scaled activation pruning method. During the forward pass, the scaled activation pruning generates near-zero activation via the Normalized Sparse Convolution (NSConv). Then, the dense activation caches are pruned based on magnitude. During the backward pass, these pruned caches are used to compute the gradients. Scaled activation pruning significantly saves activation memory footprints by more than 3 times in the CIFAR-10 dataset with the MobileNetV2 model.
Figure 3: Distribution of output from a convolution layer in ResNet18 using batch normalization layers (BatchNorm), without normalization layers (w/o Norm), and with our proposed Normalized Sparse Convolution (NSConv). The output experiences an internal covariate shift when training without normalization layers, whereas NSConv effectively mitigates this issue. Figure \ref{['fig:hist']} in the appendix shows the output distribution for all convolution layers in the ResNet18 model.
Figure 4: Comparison of accuracy and memory footprint of our proposed FedMef with the existing federated pruning methods on three datasets. The black dashed line marks the accuracy of training a full-size model (without pruning) in FedAvg. The memory footprint ratio is the memory footprint relative to training a full-size model in FedAvg.
Figure 5: FedMef's average accuracy and standard deviation are compared against: (left) various federated pruning frameworks when the training batch size is 1, where the black dashed line represents the accuracy of FedAvg framework; (middle) FedAvg and FedTiny across varying degrees of data heterogeneity; (right) modified versions of FedMef - one excluding BaE (similar to FedTiny's approach) and the other without SAP (omitting NSConv).
...and 4 more figures

Theorems & Definitions (1)

Theorem 1

FedMef: Towards Memory-efficient Federated Dynamic Pruning

TL;DR

Abstract

FedMef: Towards Memory-efficient Federated Dynamic Pruning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (1)