Over-the-air Federated Policy Gradient

Huiwen Yang; Lingying Huang; Subhrakanti Dey; Ling Shi

Over-the-air Federated Policy Gradient

Huiwen Yang, Lingying Huang, Subhrakanti Dey, Ling Shi

TL;DR

This work introduces over-the-air federated policy gradient (OA-FPG) for scalable multi-agent RL, where agents transmit analog gradient updates over a shared wireless channel and a central controller updates the policy parameter $\boldsymbol{\theta}$ from the superposed signal $\boldsymbol{v}_k$. The authors prove $L$-smoothness of the objective under standard RL assumptions and establish convergence with a linear speedup in the number of agents $N$ under favorable channel statistics, providing explicit complexity bounds to reach an $\epsilon$-approximate stationary point. They further analyze the impact of channel noise and fading, deriving conditions under which the performance degrades gracefully, and validate the approach via simulations on OpenAI-like tasks with Rayleigh and Nakagami channels. The results indicate substantial communication efficiency gains for large-scale federated RL without sacrificing convergence properties, with potential extensions to fully decentralized and collaborative setups.

Abstract

In recent years, over-the-air aggregation has been widely considered in large-scale distributed learning, optimization, and sensing. In this paper, we propose the over-the-air federated policy gradient algorithm, where all agents simultaneously broadcast an analog signal carrying local information to a common wireless channel, and a central controller uses the received aggregated waveform to update the policy parameters. We investigate the effect of noise and channel distortion on the convergence of the proposed algorithm, and establish the complexities of communication and sampling for finding an $ε$-approximate stationary point. Finally, we present some simulation results to show the effectiveness of the algorithm.

Over-the-air Federated Policy Gradient

TL;DR

from the superposed signal

. The authors prove

-smoothness of the objective under standard RL assumptions and establish convergence with a linear speedup in the number of agents

under favorable channel statistics, providing explicit complexity bounds to reach an

-approximate stationary point. They further analyze the impact of channel noise and fading, deriving conditions under which the performance degrades gracefully, and validate the approach via simulations on OpenAI-like tasks with Rayleigh and Nakagami channels. The results indicate substantial communication efficiency gains for large-scale federated RL without sacrificing convergence properties, with potential extensions to fully decentralized and collaborative setups.

Abstract

-approximate stationary point. Finally, we present some simulation results to show the effectiveness of the algorithm.

Paper Structure (12 sections, 6 theorems, 25 equations, 2 figures, 2 algorithms)

This paper contains 12 sections, 6 theorems, 25 equations, 2 figures, 2 algorithms.

Introduction
Problem Formulation
Federated Reinforcement Learning
Policy Gradient
Over-the-air Aggregation
Main Results
Simulation Results
Conclusion
Proof of Lemma \ref{['lm1']}
Proof of Lemma \ref{['lm2']}
Proof of Theorem \ref{['thm:convergence']}
Proof of Theorem \ref{['thm:convergence2']}

Key Result

Lemma 1

chen2021communication Under Assumption asm:lossbound and Assumption asm:gradientbound, the cumulative loss $J(\boldsymbol{\theta})$ is $L$-smooth, i.e., for any $\boldsymbol{\theta}_1,\boldsymbol{\theta}_2\in\mathbb{R}^d$, it holds that $\left\Vert \nabla J(\boldsymbol{\theta}_1) - \nabla J(\boldsym

Figures (2)

Figure 1: Empirical cumulative reward under Rayleigh channel ($\alpha=0.0001$).
Figure 4: Empirical cumulative reward under Nakagami-$m$ channel ($\alpha=0.001$).

Theorems & Definitions (13)

Lemma 1
Lemma 2
proof
Lemma 3
proof
Theorem 1
proof
Remark 1
Corollary 1
Remark 2
...and 3 more

Over-the-air Federated Policy Gradient

TL;DR

Abstract

Over-the-air Federated Policy Gradient

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (13)