Table of Contents
Fetching ...

FedHPD: Heterogeneous Federated Reinforcement Learning via Policy Distillation

Wenzheng Jiang, Ji Wang, Xiongtao Zhang, Weidong Bao, Cheston Tan, Flint Xiaofeng Fan

TL;DR

This work addresses FedRL under agent heterogeneity and black-box settings by proposing Federated Heterogeneous Policy Distillation (FedHPD), which uses action probability distributions as the medium for knowledge sharing via a public state set ${\mathcal{S}}_p$. FedHPD operates in two stages: local policy updates with REINFORCE and periodic collaborative distillation where agents upload distributions, a central server computes a global consensus, and each agent minimizes a KL-regularized objective $J'(\theta_k)=J(\theta_k)-\lambda D_{KL}(\pi_{\theta_k} \| \pi_{global})$. Theoretical results show $J'(\theta)$ is $L$-smooth under standard assumptions and that FedHPD can achieve fast convergence with appropriate distillation settings (e.g., $\lambda=1$); experiments across CartPole, LunarLander, and InvertedPendulum demonstrate significant improvements in both system-wide and individual agent performance, while also revealing the impact of the distillation interval $d$ on learning dynamics. Overall, FedHPD provides a practical, privacy-preserving, and convergence-guaranteed approach to heterogeneous FedRL, with strong empirical validation and avenues for extension such as explicit sample complexity analyses and adaptive aggregation strategies.

Abstract

Federated Reinforcement Learning (FedRL) improves sample efficiency while preserving privacy; however, most existing studies assume homogeneous agents, limiting its applicability in real-world scenarios. This paper investigates FedRL in black-box settings with heterogeneous agents, where each agent employs distinct policy networks and training configurations without disclosing their internal details. Knowledge Distillation (KD) is a promising method for facilitating knowledge sharing among heterogeneous models, but it faces challenges related to the scarcity of public datasets and limitations in knowledge representation when applied to FedRL. To address these challenges, we propose Federated Heterogeneous Policy Distillation (FedHPD), which solves the problem of heterogeneous FedRL by utilizing action probability distributions as a medium for knowledge sharing. We provide a theoretical analysis of FedHPD's convergence under standard assumptions. Extensive experiments corroborate that FedHPD shows significant improvements across various reinforcement learning benchmark tasks, further validating our theoretical findings. Moreover, additional experiments demonstrate that FedHPD operates effectively without the need for an elaborate selection of public datasets.

FedHPD: Heterogeneous Federated Reinforcement Learning via Policy Distillation

TL;DR

This work addresses FedRL under agent heterogeneity and black-box settings by proposing Federated Heterogeneous Policy Distillation (FedHPD), which uses action probability distributions as the medium for knowledge sharing via a public state set . FedHPD operates in two stages: local policy updates with REINFORCE and periodic collaborative distillation where agents upload distributions, a central server computes a global consensus, and each agent minimizes a KL-regularized objective . Theoretical results show is -smooth under standard assumptions and that FedHPD can achieve fast convergence with appropriate distillation settings (e.g., ); experiments across CartPole, LunarLander, and InvertedPendulum demonstrate significant improvements in both system-wide and individual agent performance, while also revealing the impact of the distillation interval on learning dynamics. Overall, FedHPD provides a practical, privacy-preserving, and convergence-guaranteed approach to heterogeneous FedRL, with strong empirical validation and avenues for extension such as explicit sample complexity analyses and adaptive aggregation strategies.

Abstract

Federated Reinforcement Learning (FedRL) improves sample efficiency while preserving privacy; however, most existing studies assume homogeneous agents, limiting its applicability in real-world scenarios. This paper investigates FedRL in black-box settings with heterogeneous agents, where each agent employs distinct policy networks and training configurations without disclosing their internal details. Knowledge Distillation (KD) is a promising method for facilitating knowledge sharing among heterogeneous models, but it faces challenges related to the scarcity of public datasets and limitations in knowledge representation when applied to FedRL. To address these challenges, we propose Federated Heterogeneous Policy Distillation (FedHPD), which solves the problem of heterogeneous FedRL by utilizing action probability distributions as a medium for knowledge sharing. We provide a theoretical analysis of FedHPD's convergence under standard assumptions. Extensive experiments corroborate that FedHPD shows significant improvements across various reinforcement learning benchmark tasks, further validating our theoretical findings. Moreover, additional experiments demonstrate that FedHPD operates effectively without the need for an elaborate selection of public datasets.

Paper Structure

This paper contains 28 sections, 4 theorems, 17 equations, 11 figures, 6 tables, 1 algorithm.

Key Result

Proposition 1

Under Assumption policy derivate, $J({\theta})$ is $L$-smooth. Therefore, for all $\theta, \theta' \in \mathbb{R}^d$, there exist a constant $L_J > 0$ satisfies:

Figures (11)

  • Figure 1: Illustration of FedHPD. 1. Generate public state set $S_p$ through the virtual agent; 2. Agents perform local training; 3. Get action probability distributions through public state set; 4. Knowledge aggregation to form global consensus; 5. Calculate KL divergence to execute knowledge digestion.
  • Figure 2: Comparisons of system performance under different distillation intervals ($d$ = 5, 10, 20).
  • Figure 3: Comparisons of selected individual performance under different distillation intervals ($d$ = 5, 10, 20).
  • Figure 4: Comparisons of system performance between DPA-FedRL and FedHPD ($d$ = 5, 10, 20).
  • Figure 5: Comparisons of system performance under different $d$ values ($d$ = 2, 5, 10, 20, 40, 80).
  • ...and 6 more figures

Theorems & Definitions (5)

  • Proposition 1
  • Lemma 1: KL Divergence Smoothness
  • Theorem 1: Convergence of REINFORCE with Knowledge Distillation
  • Corollary 1: Fast Convergence of FedHPD
  • Remark