Game-Theoretic Robust Reinforcement Learning Handles Temporally-Coupled Perturbations

Yongyuan Liang; Yanchao Sun; Ruijie Zheng; Xiangyu Liu; Benjamin Eysenbach; Tuomas Sandholm; Furong Huang; Stephen McAleer

Game-Theoretic Robust Reinforcement Learning Handles Temporally-Coupled Perturbations

Yongyuan Liang, Yanchao Sun, Ruijie Zheng, Xiangyu Liu, Benjamin Eysenbach, Tuomas Sandholm, Furong Huang, Stephen McAleer

TL;DR

The paper addresses robustness in reinforcement learning under temporally-coupled perturbations, a realistic threat not captured by traditional i.i.d. attacks. It introduces GRAD, a PSRO-based game-theoretic framework that treats robust RL as a two-player zero-sum game and approximates Nash equilibrium to defend against both temporally-coupled and non-temporally-coupled attacks, using an $\bar{\epsilon}$-temporal constraint to model correlations. GRAD defines temporally-coupled perturbations and demonstrates convergence to approximate equilibrium while improving robustness across five MuJoCo continuous-control tasks, with little loss in natural performance. The approach is adaptable to diverse adversaries and attack domains, though it entails computational costs associated with PSRO; future work includes scalability enhancements and extending to pixel-based or real-world settings.

Abstract

Deploying reinforcement learning (RL) systems requires robustness to uncertainty and model misspecification, yet prior robust RL methods typically only study noise introduced independently across time. However, practical sources of uncertainty are usually coupled across time. We formally introduce temporally-coupled perturbations, presenting a novel challenge for existing robust RL methods. To tackle this challenge, we propose GRAD, a novel game-theoretic approach that treats the temporally-coupled robust RL problem as a partially observable two-player zero-sum game. By finding an approximate equilibrium within this game, GRAD optimizes for general robustness against temporally-coupled perturbations. Experiments on continuous control tasks demonstrate that, compared with prior methods, our approach achieves a higher degree of robustness to various types of attacks on different attack domains, both in settings with temporally-coupled perturbations and decoupled perturbations.

Game-Theoretic Robust Reinforcement Learning Handles Temporally-Coupled Perturbations

TL;DR

-temporal constraint to model correlations. GRAD defines temporally-coupled perturbations and demonstrates convergence to approximate equilibrium while improving robustness across five MuJoCo continuous-control tasks, with little loss in natural performance. The approach is adaptable to diverse adversaries and attack domains, though it entails computational costs associated with PSRO; future work includes scalability enhancements and extending to pixel-based or real-world settings.

Abstract

Paper Structure (20 sections, 1 theorem, 4 equations, 13 figures, 2 tables, 5 algorithms)

This paper contains 20 sections, 1 theorem, 4 equations, 13 figures, 2 tables, 5 algorithms.

Introduction
Preliminaries
Robustness to Temporally-Coupled Attacks
Temporally-coupled Attack
GRAD: Game-Theoretic Approach for Adversarial Defense
Experiments
Case I: Robustness against state perturbations.
Case II: Robustness against action uncertainty.
Case III: Robustness against mixed adversaries.
Related Work
Conclusion and Discussion
Proof of Proposition \ref{['nash']}
Additional Related Work
Experiment Details and Additional Results
Implementation details
...and 5 more sections

Key Result

Proposition 3.3

For a finite-horizon MDP with a fixed number of discrete actions, GRAD converges to an approximate Nash Equilibrium (NE) of the two-player zero-sum adversarial game.

Figures (13)

Figure 1: The robust GRAD agents (top) and the state-of-the-art robust WocaR-RL liang2022efficient(bottom) exhibit distinct learned behaviors. Under standard non-temporally-coupled attacks, both agents maintain basic body stability, with the GRAD agent making an effort to avoid lateral rotations. Notably, WocaR-RL focuses on enhancing robustness in worst-case scenarios, but our experiments reveal its vulnerability to temporally-coupled attacks, leading to a tendency to fall towards one side. In contrast, GRAD showcases superior robustness in both non-temporally-coupled and temporally-coupled adversarial settings.
Figure 2: Standard perturbations and temporally-coupled perturbations in a 2d example.
Figure 3: Average episode rewards ± standard deviation over 100 episodes under the strongest non-temporally-coupling and temporally-coupling state attacks for state robust baselines and GRAD on five control tasks.
Figure 4: Average episode rewards ± standard deviation over 100 episodes for GRAD and action robust models against the strongest non-temporally-coupled and temporally-coupled action perturbations on five MuJoCo tasks.
Figure 5: Robustness to Model Uncertainty Across Various $\alpha$ Values. The noisy probability $\alpha$ represents the likelihood of a randomly sampled noise replacing the initially selected action.
...and 8 more figures

Theorems & Definitions (4)

Definition 3.1: $\epsilon$-Admissible Adversary Perturbations
Definition 3.2: $\bar{\epsilon}$-Temporally-coupled Perturbations
Proposition 3.3
proof

Game-Theoretic Robust Reinforcement Learning Handles Temporally-Coupled Perturbations

TL;DR

Abstract

Game-Theoretic Robust Reinforcement Learning Handles Temporally-Coupled Perturbations

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (4)