Local Environment Poisoning Attacks on Federated Reinforcement Learning
Evelyn Ma, Praneet Rathi, S. Rasoul Etesami
TL;DR
This work examines how Federated Reinforcement Learning (FRL) systems can be compromised by local environment poisoning applied to a subset of agents. It introduces a general bi-level optimization framework and concrete poisoning protocols for policy-based FRL, including an adversary design with public-private critics in actor-critic settings and reward poisoning for policy-gradient methods. The authors prove a theoretical guarantee that, under certain conditions, the poisoned global objective decreases relative to the clean FRL, and they validate the approach with extensive experiments on standard OpenAI Gym tasks using VPG and PPO, showing significant degradation in global performance compared to baselines. They also discuss a defense mechanism based on per-agent credit scoring, highlighting the practical need for robust FRL algorithms as federated RL deployments scale.
Abstract
Federated learning (FL) has become a popular tool for solving traditional Reinforcement Learning (RL) tasks. The multi-agent structure addresses the major concern of data-hungry in traditional RL, while the federated mechanism protects the data privacy of individual agents. However, the federated mechanism also exposes the system to poisoning by malicious agents that can mislead the trained policy. Despite the advantage brought by FL, the vulnerability of Federated Reinforcement Learning (FRL) has not been well-studied before. In this work, we propose a general framework to characterize FRL poisoning as an optimization problem and design a poisoning protocol that can be applied to policy-based FRL. Our framework can also be extended to FRL with actor-critic as a local RL algorithm by training a pair of private and public critics. We provably show that our method can strictly hurt the global objective. We verify our poisoning effectiveness by conducting extensive experiments targeting mainstream RL algorithms and over various RL OpenAI Gym environments covering a wide range of difficulty levels. Within these experiments, we compare clean and baseline poisoning methods against our proposed framework. The results show that the proposed framework is successful in poisoning FRL systems and reducing performance across various environments and does so more effectively than baseline methods. Our work provides new insights into the vulnerability of FL in RL training and poses new challenges for designing robust FRL algorithms
