Consistent Attack: Universal Adversarial Perturbation on Embodied Vision Navigation

Chengyang Ying; You Qiaoben; Xinning Zhou; Hang Su; Wenbo Ding; Jianyong Ai

Consistent Attack: Universal Adversarial Perturbation on Embodied Vision Navigation

Chengyang Ying, You Qiaoben, Xinning Zhou, Hang Su, Wenbo Ding, Jianyong Ai

TL;DR

This paper tackles the robustness of Embodied Vision Navigation agents to universal adversarial perturbations in a sequential decision-making setting. It formulates a δ-disturbed Markov Decision Process (δ-MDP) to capture the persistent effect of a fixed perturbation δ on observations and derives a Disturbed Policy Gradient to optimize δ. Two Consistent Attack strategies, Reward UAP and Trajectory UAP, are proposed to account for environment dynamics and to estimate disturbed Q functions under different feedback signals, including a goal-based reward. Empirical results across Habitat, Gibson, and MP3D datasets with RGB and Depth inputs show that these attacks significantly degrade agent performance, underscoring important real-world safety and robustness considerations for embodied navigation systems.

Abstract

Embodied agents in vision navigation coupled with deep neural networks have attracted increasing attention. However, deep neural networks have been shown vulnerable to malicious adversarial noises, which may potentially cause catastrophic failures in Embodied Vision Navigation. Among different adversarial noises, universal adversarial perturbations (UAP), i.e., a constant image-agnostic perturbation applied on every input frame of the agent, play a critical role in Embodied Vision Navigation since they are computation-efficient and application-practical during the attack. However, existing UAP methods ignore the system dynamics of Embodied Vision Navigation and might be sub-optimal. In order to extend UAP to the sequential decision setting, we formulate the disturbed environment under the universal noise $δ$, as a $δ$-disturbed Markov Decision Process ($δ$-MDP). Based on the formulation, we analyze the properties of $δ$-MDP and propose two novel Consistent Attack methods, named Reward UAP and Trajectory UAP, for attacking Embodied agents, which consider the dynamic of the MDP and calculate universal noises by estimating the disturbed distribution and the disturbed Q function. For various victim models, our Consistent Attack can cause a significant drop in their performance in the PointGoal task in Habitat with different datasets and different scenes. Extensive experimental results indicate that there exist serious potential risks for applying Embodied Vision Navigation methods to the real world.

Consistent Attack: Universal Adversarial Perturbation on Embodied Vision Navigation

TL;DR

Abstract

, as a

-disturbed Markov Decision Process (

-MDP). Based on the formulation, we analyze the properties of

-MDP and propose two novel Consistent Attack methods, named Reward UAP and Trajectory UAP, for attacking Embodied agents, which consider the dynamic of the MDP and calculate universal noises by estimating the disturbed distribution and the disturbed Q function. For various victim models, our Consistent Attack can cause a significant drop in their performance in the PointGoal task in Habitat with different datasets and different scenes. Extensive experimental results indicate that there exist serious potential risks for applying Embodied Vision Navigation methods to the real world.

Paper Structure (16 sections, 19 equations, 2 figures, 2 tables, 1 algorithm)

This paper contains 16 sections, 19 equations, 2 figures, 2 tables, 1 algorithm.

Introduction
Related Work
Embodied Vision Navigation
Adversarial Attacks
Preliminaries
Notations
Markov Decision Process
Universal Adversarial perturbation
UAP in MDP
Methodology
$\delta$-Markov Decision Process
Reward UAP and Trajectory UAP
Experimental Results
Environment setups
Results
...and 1 more sections

Figures (2)

Figure 1: An illustration of universal adversarial perturbation on observations of an agent in Embodied Vision Navigation. In this task, the agent needs to navigate from the red triangle to the red star. The green curve is the navigable path. At each timestep, the adversary adds a consistent noise to the input frame. Finally, the adversary misleads the agent to take the blue curve.
Figure 2: Visualization of trajectories sampled by the victim policy and the attacked policy via Reward UAP on the PointGoal task in the Habitat-test dataset. The blue line is the trajectory of the Embodied agent and we also report the trajectory's Succ, SPL, and Reward.

Theorems & Definitions (1)

proof

Consistent Attack: Universal Adversarial Perturbation on Embodied Vision Navigation

TL;DR

Abstract

Consistent Attack: Universal Adversarial Perturbation on Embodied Vision Navigation

Authors

TL;DR

Abstract

Table of Contents

Figures (2)

Theorems & Definitions (1)