A Theoretical Framework for Energy-Aware Gradient Pruning in Federated Learning

Emmanouil M. Athanasakos

A Theoretical Framework for Energy-Aware Gradient Pruning in Federated Learning

Emmanouil M. Athanasakos

Abstract

Federated Learning (FL) is constrained by the communication and energy limitations of decentralized edge devices. While gradient sparsification via Top-K magnitude pruning effectively reduces the communication payload, it remains inherently energy-agnostic. It assumes all parameter updates incur identical downstream transmission and memory-update costs, ignoring hardware realities. We formalize the pruning process as an energy-constrained projection problem that accounts for the hardware-level disparities between memory-intensive and compute-efficient operations during the post-backpropagation phase. We propose Cost-Weighted Magnitude Pruning (CWMP), a selection rule that prioritizes parameter updates based on their magnitude relative to their physical cost. We demonstrate that CWMP is the optimal greedy solution to this constrained projection and provide a probabilistic analysis of its global energy efficiency. Numerical results on a non-IID CIFAR-10 benchmark show that CWMP consistently establishes a superior performance-energy Pareto frontier compared to the Top-K baseline.

A Theoretical Framework for Energy-Aware Gradient Pruning in Federated Learning

Abstract

Paper Structure (11 sections, 2 theorems, 23 equations, 2 figures, 1 algorithm)

This paper contains 11 sections, 2 theorems, 23 equations, 2 figures, 1 algorithm.

Introduction
Problem Formulation
Cost-Weighted Magnitude Pruning
Theoretical Justification
Experimental Validation
Federated Configuration
Memory-Centric Cost Model
Baselines
Conclusion and Future Work
Proof of Lemma 1
Proof of Proposition 2

Key Result

Lemma 1

Let $\mathbf{g} \in \mathbb{R}^d$ be a gradient vector and $\mathbf{c} \in \mathbb{R}_+^d$ be a strictly positive cost vector. Consider the energy-constrained gradient mass maximization over a support set $S \subseteq \{1, \dots, d\}$: where $E_{\text{budget}} > 0$. The optimal selection policy for the continuous relaxation of eq:knapsack is to rank and select parameters in descending order of th

Figures (2)

Figure 1: Performance-Energy Pareto Frontier. CWMP is the dominant strategy in the extreme scarcity regime (1%) and maintains a superior accuracy-per-energy profile as the budget increases, avoiding the performance regression observed in Top-K.
Figure 2: Convergence Dynamics at 10% Sparsity. CWMP matches the convergence rate of the Top-K baseline while achieving a higher accuracy, demonstrating a more efficient allocation of the fixed communication budget.

Theorems & Definitions (2)

Lemma 1
Proposition 2

A Theoretical Framework for Energy-Aware Gradient Pruning in Federated Learning

Abstract

A Theoretical Framework for Energy-Aware Gradient Pruning in Federated Learning

Authors

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (2)