Table of Contents
Fetching ...

State-Aware Perturbation Optimization for Robust Deep Reinforcement Learning

Zongyuan Zhang, Tianyang Duan, Zheng Lin, Dong Huang, Zihan Fang, Zekai Sun, Ling Xiong, Hongbin Liang, Heming Cui, Yong Cui

TL;DR

This work addresses the vulnerability of deep reinforcement learning (DRL) policies in robotic control to environmental perturbations by introducing the Adversarial Victim-Dynamics Markov Decision Process (AVD-MDP) for modeling attacker–victim interactions over time. Building on this theory, the authors present STAR, a state-aware white-box attack that combines a soft-mask state-targeting mechanism with an information-theoretic objective to maximize mutual information between perturbations, states, and actions, thereby achieving stealthy perturbations and dispersed state visitation under a fixed budget. The approach is evaluated on quadruped locomotion tasks in RaiSim with PPO-based victims, where STAR consistently outperforms existing white-box attacks in degrading reward, forward velocity, and stability, and also enables effective adversarial defense through training. The results demonstrate both the practical potency of STAR and its value as a framework for rigorously testing and improving DRL robustness in real-world robotic systems.

Abstract

Recently, deep reinforcement learning (DRL) has emerged as a promising approach for robotic control. However, the deployment of DRL in real-world robots is hindered by its sensitivity to environmental perturbations. While existing whitebox adversarial attacks rely on local gradient information and apply uniform perturbations across all states to evaluate DRL robustness, they fail to account for temporal dynamics and state-specific vulnerabilities. To combat the above challenge, we first conduct a theoretical analysis of white-box attacks in DRL by establishing the adversarial victim-dynamics Markov decision process (AVD-MDP), to derive the necessary and sufficient conditions for a successful attack. Based on this, we propose a selective state-aware reinforcement adversarial attack method, named STAR, to optimize perturbation stealthiness and state visitation dispersion. STAR first employs a soft mask-based state-targeting mechanism to minimize redundant perturbations, enhancing stealthiness and attack effectiveness. Then, it incorporates an information-theoretic optimization objective to maximize mutual information between perturbations, environmental states, and victim actions, ensuring a dispersed state-visitation distribution that steers the victim agent into vulnerable states for maximum return reduction. Extensive experiments demonstrate that STAR outperforms state-of-the-art benchmarks.

State-Aware Perturbation Optimization for Robust Deep Reinforcement Learning

TL;DR

This work addresses the vulnerability of deep reinforcement learning (DRL) policies in robotic control to environmental perturbations by introducing the Adversarial Victim-Dynamics Markov Decision Process (AVD-MDP) for modeling attacker–victim interactions over time. Building on this theory, the authors present STAR, a state-aware white-box attack that combines a soft-mask state-targeting mechanism with an information-theoretic objective to maximize mutual information between perturbations, states, and actions, thereby achieving stealthy perturbations and dispersed state visitation under a fixed budget. The approach is evaluated on quadruped locomotion tasks in RaiSim with PPO-based victims, where STAR consistently outperforms existing white-box attacks in degrading reward, forward velocity, and stability, and also enables effective adversarial defense through training. The results demonstrate both the practical potency of STAR and its value as a framework for rigorously testing and improving DRL robustness in real-world robotic systems.

Abstract

Recently, deep reinforcement learning (DRL) has emerged as a promising approach for robotic control. However, the deployment of DRL in real-world robots is hindered by its sensitivity to environmental perturbations. While existing whitebox adversarial attacks rely on local gradient information and apply uniform perturbations across all states to evaluate DRL robustness, they fail to account for temporal dynamics and state-specific vulnerabilities. To combat the above challenge, we first conduct a theoretical analysis of white-box attacks in DRL by establishing the adversarial victim-dynamics Markov decision process (AVD-MDP), to derive the necessary and sufficient conditions for a successful attack. Based on this, we propose a selective state-aware reinforcement adversarial attack method, named STAR, to optimize perturbation stealthiness and state visitation dispersion. STAR first employs a soft mask-based state-targeting mechanism to minimize redundant perturbations, enhancing stealthiness and attack effectiveness. Then, it incorporates an information-theoretic optimization objective to maximize mutual information between perturbations, environmental states, and victim actions, ensuring a dispersed state-visitation distribution that steers the victim agent into vulnerable states for maximum return reduction. Extensive experiments demonstrate that STAR outperforms state-of-the-art benchmarks.

Paper Structure

This paper contains 18 sections, 2 theorems, 34 equations, 11 figures, 4 tables, 1 algorithm.

Key Result

Theorem 1

(Necessary Condition for Attack Success) Given a victim agent's policy $\mu \left ( \cdot \mid s \right )$ and an adversarial policy $\nu \left ( \cdot \mid s,\mu \right )$, a necessary condition for a successful attack is given by: where $\delta \left ( \mu ,\nu \right ) \left [ s \right ]=\sum _s d^{\mu \oplus \nu} \left ( s \right )\sum _a\left ( \mu \oplus \nu \left ( a\mid s \right ) -\mu

Figures (11)

  • Figure 1: Performance comparison of three attack strategies in a four-wheeled robot navigation task: No Attack (top), Uniform Attack (middle), and Front-Dominant Attack (bottom). Each row presents the attack intensity distribution (left), navigation trajectory (middle), and accumulated reward (right). The Front-Dominant Attack exhibits the highest efficacy in disrupting navigation by concentrating perturbations in the frontal direction.
  • Figure 2: The workflow of STAR.
  • Figure 3: The Adversarial Example Generation Framework of STAR.
  • Figure 4: Aliengo and ANYmal, quadruped robots in the Raisim platform raisim.
  • Figure 5: Training trajectories of torque control, velocity control, and reward of the victim agents. (a) Aliengo locomotion. (b) ANYmal locomotion.
  • ...and 6 more figures

Theorems & Definitions (6)

  • Definition 1
  • Definition 2
  • Theorem 1
  • proof
  • Theorem 2
  • proof