Regret-Based Defense in Adversarial Reinforcement Learning

Roman Belaire; Pradeep Varakantham; Thanh Nguyen; David Lo

Regret-Based Defense in Adversarial Reinforcement Learning

Roman Belaire, Pradeep Varakantham, Thanh Nguyen, David Lo

TL;DR

This work tackles the vulnerability of deep reinforcement learning to adversarial observations by introducing regret-based defense (RAD). It formalizes regret through Cumulative Contradictory Expected Regret (CCER) and derives scalable optimization strategies, including RAD-DRN (value-iteration style), RAD-PPO (policy-gradient), and RAD-CHT (adversary-reactive with Cognitive Hierarchy Theory). Across MuJoCo, Atari, and Highway benchmarks, RAD methods consistently outperform baselines under greedy and strategic attacks while balancing nominal performance. The results demonstrate that minimizing regret, rather than maximizing expected return, yields robust policies against unseen or multi-step perturbations with practical implications for safety-critical RL deployment.

Abstract

Deep Reinforcement Learning (DRL) policies have been shown to be vulnerable to small adversarial noise in observations. Such adversarial noise can have disastrous consequences in safety-critical environments. For instance, a self-driving car receiving adversarially perturbed sensory observations about nearby signs (e.g., a stop sign physically altered to be perceived as a speed limit sign) or objects (e.g., cars altered to be recognized as trees) can be fatal. Existing approaches for making RL algorithms robust to an observation-perturbing adversary have focused on reactive approaches that iteratively improve against adversarial examples generated at each iteration. While such approaches have been shown to provide improvements over regular RL methods, they are reactive and can fare significantly worse if certain categories of adversarial examples are not generated during training. To that end, we pursue a more proactive approach that relies on directly optimizing a well-studied robustness measure, regret instead of expected value. We provide a principled approach that minimizes maximum regret over a "neighborhood" of observations to the received "observation". Our regret criterion can be used to modify existing value- and policy-based Deep RL methods. We demonstrate that our approaches provide a significant improvement in performance across a wide variety of benchmarks against leading approaches for robust Deep RL.

Regret-Based Defense in Adversarial Reinforcement Learning

TL;DR

Abstract

Paper Structure (20 sections, 3 theorems, 18 equations, 2 figures, 8 tables, 1 algorithm)

This paper contains 20 sections, 3 theorems, 18 equations, 2 figures, 8 tables, 1 algorithm.

Introduction
Related Work
RL with Adversarial Observations
Regret-based Adversarial Defense (RAD)
Regret Approximation: CCER
Approach 1: RAD-DRN
Approach 2: RAD-PPO
CCER-based Advantage.
Approach 3: RAD-CHT
Experiments
Experimental Setup
Worst Case Policy Attack
Results
Conclusion
Main Results Extended
...and 5 more sections

Key Result

Proposition 1

At time step $t$, the CCER corresponding to a policy, $\pi$ at $z_t$, i.e., $\delta_\texttt{CCER}^{\pi}(z_t)$ is minimum if it includes the CCER minimizing policy from $t+1$, i.e.,$\pi^\bot_{[t+1,H]}$ from $t+1$. Formally,

Figures (2)

Figure 1: The performance of robust RL methods against strategic adversaries. The y-axis represents the score and the x-axis represents the intensity of the attack.
Figure 2: The performance of robust RL methods against strategic adversaries. The y-axis represents the score and the x-axis represents the intensity of the attack.

Theorems & Definitions (5)

Definition 1: Regret
Definition 2: Minimax Regret Policy
Proposition 1: Optimal Substructure property
Proposition 2
Proposition 3

Regret-Based Defense in Adversarial Reinforcement Learning

TL;DR

Abstract

Regret-Based Defense in Adversarial Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (5)