Age-Aware Partial Gradient Update Strategy for Federated Learning Over the Air

Ruihao Du; Jiaqi Zhu; Zeshen Li; Howard H. Yang

Age-Aware Partial Gradient Update Strategy for Federated Learning Over the Air

Ruihao Du, Jiaqi Zhu, Zeshen Li, Howard H. Yang

TL;DR

The paper addresses the scalability challenge of over-the-air federated learning (OTA-FL) under a limited number of orthogonal waveforms by introducing AgeTop-$k$, a two-stage gradient sparsification method that first selects large-magnitude coordinates and then prioritizes stale coordinates using AoI. The approach yields a compression ratio of $k/d$ and is analyzed as a $oldsymbol{ ext{gamma}}$-approximate compressor, with convergence guarantees for non-convex loss functions and an explicit bound that reveals the impact of compression, data heterogeneity, and channel noise. Theoretical findings show an $O(1/T)$ convergence rate with a non-vanishing steady-state error due to compression and wireless imperfections, while experiments on EMNIST and CIFAR-10 with CNN and ResNet-18 demonstrate faster convergence and higher accuracy than several baselines. This work provides a practical, theoretically grounded framework for scalable OTA-FL that leverages AoI-aware gradient updates to refresh stale yet important coordinates, reducing communication overhead without sacrificing convergence or performance.

Abstract

Frequent parameter exchanges between clients and the edge server incur substantial communication overhead, posing a critical bottleneck in federated learning (FL). By exploiting the superposition property of wireless waveforms, over-the-air (OTA) computation enables simultaneous analog aggregation of local updates, thereby reducing communication latency and improving spectrum efficiency. However, its scalability is constrained by the limited number of available orthogonal waveform resources, which are typically far fewer than the model dimension. To address this, we propose AgeTop-$k$, an age-aware gradient sparsification strategy that performs compression through a two-stage selection process. Specifically, the edge server first selects candidate gradient entries based on their magnitudes, and then further prioritizes them according to the Age of Information (AoI), which quantifies the staleness of updates. AoI tracking is achieved efficiently by maintaining an age vector at the edge server. We derive theoretical convergence guarantees for non-convex loss functions and demonstrate the efficacy of AgeTop-$k$ through extensive simulations.

Age-Aware Partial Gradient Update Strategy for Federated Learning Over the Air

TL;DR

The paper addresses the scalability challenge of over-the-air federated learning (OTA-FL) under a limited number of orthogonal waveforms by introducing AgeTop-

, a two-stage gradient sparsification method that first selects large-magnitude coordinates and then prioritizes stale coordinates using AoI. The approach yields a compression ratio of

and is analyzed as a

-approximate compressor, with convergence guarantees for non-convex loss functions and an explicit bound that reveals the impact of compression, data heterogeneity, and channel noise. Theoretical findings show an

convergence rate with a non-vanishing steady-state error due to compression and wireless imperfections, while experiments on EMNIST and CIFAR-10 with CNN and ResNet-18 demonstrate faster convergence and higher accuracy than several baselines. This work provides a practical, theoretically grounded framework for scalable OTA-FL that leverages AoI-aware gradient updates to refresh stale yet important coordinates, reducing communication overhead without sacrificing convergence or performance.

Abstract

, an age-aware gradient sparsification strategy that performs compression through a two-stage selection process. Specifically, the edge server first selects candidate gradient entries based on their magnitudes, and then further prioritizes them according to the Age of Information (AoI), which quantifies the staleness of updates. AoI tracking is achieved efficiently by maintaining an age vector at the edge server. We derive theoretical convergence guarantees for non-convex loss functions and demonstrate the efficacy of AgeTop-

through extensive simulations.

Age-Aware Partial Gradient Update Strategy for Federated Learning Over the Air

TL;DR

Abstract

Age-Aware Partial Gradient Update Strategy for Federated Learning Over the Air

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (3)