Table of Contents
Fetching ...

A Theoretical Framework for Energy-Aware Gradient Pruning in Federated Learning

Emmanouil M. Athanasakos

Abstract

Federated Learning (FL) is constrained by the communication and energy limitations of decentralized edge devices. While gradient sparsification via Top-K magnitude pruning effectively reduces the communication payload, it remains inherently energy-agnostic. It assumes all parameter updates incur identical downstream transmission and memory-update costs, ignoring hardware realities. We formalize the pruning process as an energy-constrained projection problem that accounts for the hardware-level disparities between memory-intensive and compute-efficient operations during the post-backpropagation phase. We propose Cost-Weighted Magnitude Pruning (CWMP), a selection rule that prioritizes parameter updates based on their magnitude relative to their physical cost. We demonstrate that CWMP is the optimal greedy solution to this constrained projection and provide a probabilistic analysis of its global energy efficiency. Numerical results on a non-IID CIFAR-10 benchmark show that CWMP consistently establishes a superior performance-energy Pareto frontier compared to the Top-K baseline.

A Theoretical Framework for Energy-Aware Gradient Pruning in Federated Learning

Abstract

Federated Learning (FL) is constrained by the communication and energy limitations of decentralized edge devices. While gradient sparsification via Top-K magnitude pruning effectively reduces the communication payload, it remains inherently energy-agnostic. It assumes all parameter updates incur identical downstream transmission and memory-update costs, ignoring hardware realities. We formalize the pruning process as an energy-constrained projection problem that accounts for the hardware-level disparities between memory-intensive and compute-efficient operations during the post-backpropagation phase. We propose Cost-Weighted Magnitude Pruning (CWMP), a selection rule that prioritizes parameter updates based on their magnitude relative to their physical cost. We demonstrate that CWMP is the optimal greedy solution to this constrained projection and provide a probabilistic analysis of its global energy efficiency. Numerical results on a non-IID CIFAR-10 benchmark show that CWMP consistently establishes a superior performance-energy Pareto frontier compared to the Top-K baseline.
Paper Structure (11 sections, 2 theorems, 23 equations, 2 figures, 1 algorithm)

This paper contains 11 sections, 2 theorems, 23 equations, 2 figures, 1 algorithm.

Key Result

Lemma 1

Let $\mathbf{g} \in \mathbb{R}^d$ be a gradient vector and $\mathbf{c} \in \mathbb{R}_+^d$ be a strictly positive cost vector. Consider the energy-constrained gradient mass maximization over a support set $S \subseteq \{1, \dots, d\}$: where $E_{\text{budget}} > 0$. The optimal selection policy for the continuous relaxation of eq:knapsack is to rank and select parameters in descending order of th

Figures (2)

  • Figure 1: Performance-Energy Pareto Frontier. CWMP is the dominant strategy in the extreme scarcity regime (1%) and maintains a superior accuracy-per-energy profile as the budget increases, avoiding the performance regression observed in Top-K.
  • Figure 2: Convergence Dynamics at 10% Sparsity. CWMP matches the convergence rate of the Top-K baseline while achieving a higher accuracy, demonstrating a more efficient allocation of the fixed communication budget.

Theorems & Definitions (2)

  • Lemma 1
  • Proposition 2