Table of Contents
Fetching ...

AdvIRL: Reinforcement Learning-Based Adversarial Attacks on 3D NeRF Models

Tommy Nguyen, Mehmet Ergezer, Christian Green

TL;DR

AdvIRL addresses the vulnerability of 3D vision systems based on Neural Radiance Fields (NeRF) to adversarial perturbations in a realistic black-box setting. It jointly leverages segmentation (Detectron2) to localize targets, Instant-NGP to render multi-view NeRFs, and a reinforcement-learning agent (PPO) to perturb NeRF parameters, guided by a CLIP-based reward to produce adversarial images $X^{\mathrm{adv}}$ with robust performance under 3D transformations. The method demonstrates targeted and untargeted attacks across scenes from Tanks and Temples and a banana scene, achieving high-confidence misclassifications such as banana$\rightarrow$slug and truck$\rightarrow$cannon, and shows potential for generating adversarial training data to bolster robustness of 3D perception systems. This work highlights practical security risks and provides a scalable framework for evaluating and improving the resilience of 3D vision pipelines.

Abstract

The increasing deployment of AI models in critical applications has exposed them to significant risks from adversarial attacks. While adversarial vulnerabilities in 2D vision models have been extensively studied, the threat landscape for 3D generative models, such as Neural Radiance Fields (NeRF), remains underexplored. This work introduces \textit{AdvIRL}, a novel framework for crafting adversarial NeRF models using Instant Neural Graphics Primitives (Instant-NGP) and Reinforcement Learning. Unlike prior methods, \textit{AdvIRL} generates adversarial noise that remains robust under diverse 3D transformations, including rotations and scaling, enabling effective black-box attacks in real-world scenarios. Our approach is validated across a wide range of scenes, from small objects (e.g., bananas) to large environments (e.g., lighthouses). Notably, targeted attacks achieved high-confidence misclassifications, such as labeling a banana as a slug and a truck as a cannon, demonstrating the practical risks posed by adversarial NeRFs. Beyond attacking, \textit{AdvIRL}-generated adversarial models can serve as adversarial training data to enhance the robustness of vision systems. The implementation of \textit{AdvIRL} is publicly available at \url{https://github.com/Tommy-Nguyen-cpu/AdvIRL/tree/MultiView-Clean}, ensuring reproducibility and facilitating future research.

AdvIRL: Reinforcement Learning-Based Adversarial Attacks on 3D NeRF Models

TL;DR

AdvIRL addresses the vulnerability of 3D vision systems based on Neural Radiance Fields (NeRF) to adversarial perturbations in a realistic black-box setting. It jointly leverages segmentation (Detectron2) to localize targets, Instant-NGP to render multi-view NeRFs, and a reinforcement-learning agent (PPO) to perturb NeRF parameters, guided by a CLIP-based reward to produce adversarial images with robust performance under 3D transformations. The method demonstrates targeted and untargeted attacks across scenes from Tanks and Temples and a banana scene, achieving high-confidence misclassifications such as bananaslug and truckcannon, and shows potential for generating adversarial training data to bolster robustness of 3D perception systems. This work highlights practical security risks and provides a scalable framework for evaluating and improving the resilience of 3D vision pipelines.

Abstract

The increasing deployment of AI models in critical applications has exposed them to significant risks from adversarial attacks. While adversarial vulnerabilities in 2D vision models have been extensively studied, the threat landscape for 3D generative models, such as Neural Radiance Fields (NeRF), remains underexplored. This work introduces \textit{AdvIRL}, a novel framework for crafting adversarial NeRF models using Instant Neural Graphics Primitives (Instant-NGP) and Reinforcement Learning. Unlike prior methods, \textit{AdvIRL} generates adversarial noise that remains robust under diverse 3D transformations, including rotations and scaling, enabling effective black-box attacks in real-world scenarios. Our approach is validated across a wide range of scenes, from small objects (e.g., bananas) to large environments (e.g., lighthouses). Notably, targeted attacks achieved high-confidence misclassifications, such as labeling a banana as a slug and a truck as a cannon, demonstrating the practical risks posed by adversarial NeRFs. Beyond attacking, \textit{AdvIRL}-generated adversarial models can serve as adversarial training data to enhance the robustness of vision systems. The implementation of \textit{AdvIRL} is publicly available at \url{https://github.com/Tommy-Nguyen-cpu/AdvIRL/tree/MultiView-Clean}, ensuring reproducibility and facilitating future research.

Paper Structure

This paper contains 19 sections, 1 equation, 6 figures, 1 algorithm.

Figures (6)

  • Figure 1: AdvIRL begins by processing a set of input images to generate segmented images, denoted as $X^{\text{segmented}}$. These segmented images are then used to render the NeRF model, producing rendered images $X$. The initial parameters $P_0$ of the NeRF model are extracted, enabling AdvIRL to modify them as part of the pipeline. Using the initial observation space ($X$), AdvIRL generates an action $A_i$, which is concatenated with the parameters $P_i$ at timestep $i$. The updated parameters are subsequently processed by Instant-NGP to generate multi-view shots of the object. These multi-view shots are then classified using our CLIP classifier, which outputs results that the environment uses to compute a reward $R$, as defined in the accompanying figure. This reward guides the agent in determining subsequent actions. The red arrow in the diagram illustrates the feedback loop, where parameters generated by AdvIRL are fed back to Instant-NGP to produce the adversarial 3D model.
  • Figure 2: Generated results of different scenes using AdvIRL, shown from multiple angles and distances. The leftmost column shows the original images, while subsequent columns show images with applied adversarial noise.
  • Figure 3: AdvIRL-generated adversarial noise targeting the boathouse class for the lighthouse scene, with 50% classification confidence using the CLIP model.
  • Figure 4: Additional images of the adversarial train scene from various angles and distances.
  • Figure 5: Images of the adversarially perturbed truck generated by AdvIRL from additional angles.
  • ...and 1 more figures