Table of Contents
Fetching ...

SEBA: Sample-Efficient Black-Box Attacks on Visual Reinforcement Learning

Tairan Huang, Yulin Jin, Junxu Liu, Qingqing Ye, Haibo Hu

TL;DR

This work tackles the vulnerability of visual reinforcement learning to adversarial perturbations under a strict black-box setting. It introduces SEBA, a framework that combines a shadow Q model to estimate long-horizon rewards under perturbations, a GAN-based perturbation generator for perceptually realistic attacks, and a learned world model to enable synthetic rollouts and dramatically reduce real-environment queries; learning proceeds via a two-stage alternating optimization. SEBA demonstrates superior attack effectiveness, visual imperceptibility, and query efficiency on MuJoCo and Atari benchmarks, and extends to targeted control tasks, illustrating practical applicability for robustness evaluation in embodied AI. By avoiding gradient access and high-dimensional perturbation RL optimization, SEBA provides a principled, deployable approach to assess and stress-test visual RL systems under constrained access.

Abstract

Visual reinforcement learning has achieved remarkable progress in visual control and robotics, but its vulnerability to adversarial perturbations remains underexplored. Most existing black-box attacks focus on vector-based or discrete-action RL, and their effectiveness on image-based continuous control is limited by the large action space and excessive environment queries. We propose SEBA, a sample-efficient framework for black-box adversarial attacks on visual RL agents. SEBA integrates a shadow Q model that estimates cumulative rewards under adversarial conditions, a generative adversarial network that produces visually imperceptible perturbations, and a world model that simulates environment dynamics to reduce real-world queries. Through a two-stage iterative training procedure that alternates between learning the shadow model and refining the generator, SEBA achieves strong attack performance while maintaining efficiency. Experiments on MuJoCo and Atari benchmarks show that SEBA significantly reduces cumulative rewards, preserves visual fidelity, and greatly decreases environment interactions compared to prior black-box and white-box methods.

SEBA: Sample-Efficient Black-Box Attacks on Visual Reinforcement Learning

TL;DR

This work tackles the vulnerability of visual reinforcement learning to adversarial perturbations under a strict black-box setting. It introduces SEBA, a framework that combines a shadow Q model to estimate long-horizon rewards under perturbations, a GAN-based perturbation generator for perceptually realistic attacks, and a learned world model to enable synthetic rollouts and dramatically reduce real-environment queries; learning proceeds via a two-stage alternating optimization. SEBA demonstrates superior attack effectiveness, visual imperceptibility, and query efficiency on MuJoCo and Atari benchmarks, and extends to targeted control tasks, illustrating practical applicability for robustness evaluation in embodied AI. By avoiding gradient access and high-dimensional perturbation RL optimization, SEBA provides a principled, deployable approach to assess and stress-test visual RL systems under constrained access.

Abstract

Visual reinforcement learning has achieved remarkable progress in visual control and robotics, but its vulnerability to adversarial perturbations remains underexplored. Most existing black-box attacks focus on vector-based or discrete-action RL, and their effectiveness on image-based continuous control is limited by the large action space and excessive environment queries. We propose SEBA, a sample-efficient framework for black-box adversarial attacks on visual RL agents. SEBA integrates a shadow Q model that estimates cumulative rewards under adversarial conditions, a generative adversarial network that produces visually imperceptible perturbations, and a world model that simulates environment dynamics to reduce real-world queries. Through a two-stage iterative training procedure that alternates between learning the shadow model and refining the generator, SEBA achieves strong attack performance while maintaining efficiency. Experiments on MuJoCo and Atari benchmarks show that SEBA significantly reduces cumulative rewards, preserves visual fidelity, and greatly decreases environment interactions compared to prior black-box and white-box methods.

Paper Structure

This paper contains 31 sections, 11 equations, 1 figure, 5 tables, 3 algorithms.

Figures (1)

  • Figure 1: SEBA overview. The world model is first trained to predict visual dynamics and generate synthetic rollouts. SEBA then alternates between updating the shadow critic on perturbed states and optimizing the GAN.