Table of Contents
Fetching ...

UV-Attack: Physical-World Adversarial Attacks for Person Detection via Dynamic-NeRF-based UV Mapping

Yanjie Li, Kaisheng Liang, Bin Xiao

TL;DR

UV-Attack introduces a novel physical adversarial attack against person detectors by leveraging dynamic NeRF-based UV mapping to edit 3D clothing textures in real time. By sampling realistic unseen poses with a Gaussian Mixture Model and optimizing a Pose-Transformation aware loss (EoPT), the method preserves attack efficacy across varied actions and viewpoints. The approach uses UV-volume textures, diffusion-based adversarial patches, and TPS-based texture distortions to achieve high attack success rates in both digital and physical settings, outperforming state-of-the-art baselines and showing strong transferability across detectors. This work highlights a practical, scalable route to adversarial textures for non-rigid 3D objects and suggests broader implications for robustness of person-detection systems in dynamic real-world scenarios.

Abstract

In recent research, adversarial attacks on person detectors using patches or static 3D model-based texture modifications have struggled with low success rates due to the flexible nature of human movement. Modeling the 3D deformations caused by various actions has been a major challenge. Fortunately, advancements in Neural Radiance Fields (NeRF) for dynamic human modeling offer new possibilities. In this paper, we introduce UV-Attack, a groundbreaking approach that achieves high success rates even with extensive and unseen human actions. We address the challenge above by leveraging dynamic-NeRF-based UV mapping. UV-Attack can generate human images across diverse actions and viewpoints, and even create novel actions by sampling from the SMPL parameter space. While dynamic NeRF models are capable of modeling human bodies, modifying clothing textures is challenging because they are embedded in neural network parameters. To tackle this, UV-Attack generates UV maps instead of RGB images and modifies the texture stacks. This approach enables real-time texture edits and makes the attack more practical. We also propose a novel Expectation over Pose Transformation loss (EoPT) to improve the evasion success rate on unseen poses and views. Our experiments show that UV-Attack achieves a 92.7% attack success rate against the FastRCNN model across varied poses in dynamic video settings, significantly outperforming the state-of-the-art AdvCamou attack, which only had a 28.5% ASR. Moreover, we achieve 49.5% ASR on the latest YOLOv8 detector in black-box settings. This work highlights the potential of dynamic NeRF-based UV mapping for creating more effective adversarial attacks on person detectors, addressing key challenges in modeling human movement and texture modification. The code is available at https://github.com/PolyLiYJ/UV-Attack.

UV-Attack: Physical-World Adversarial Attacks for Person Detection via Dynamic-NeRF-based UV Mapping

TL;DR

UV-Attack introduces a novel physical adversarial attack against person detectors by leveraging dynamic NeRF-based UV mapping to edit 3D clothing textures in real time. By sampling realistic unseen poses with a Gaussian Mixture Model and optimizing a Pose-Transformation aware loss (EoPT), the method preserves attack efficacy across varied actions and viewpoints. The approach uses UV-volume textures, diffusion-based adversarial patches, and TPS-based texture distortions to achieve high attack success rates in both digital and physical settings, outperforming state-of-the-art baselines and showing strong transferability across detectors. This work highlights a practical, scalable route to adversarial textures for non-rigid 3D objects and suggests broader implications for robustness of person-detection systems in dynamic real-world scenarios.

Abstract

In recent research, adversarial attacks on person detectors using patches or static 3D model-based texture modifications have struggled with low success rates due to the flexible nature of human movement. Modeling the 3D deformations caused by various actions has been a major challenge. Fortunately, advancements in Neural Radiance Fields (NeRF) for dynamic human modeling offer new possibilities. In this paper, we introduce UV-Attack, a groundbreaking approach that achieves high success rates even with extensive and unseen human actions. We address the challenge above by leveraging dynamic-NeRF-based UV mapping. UV-Attack can generate human images across diverse actions and viewpoints, and even create novel actions by sampling from the SMPL parameter space. While dynamic NeRF models are capable of modeling human bodies, modifying clothing textures is challenging because they are embedded in neural network parameters. To tackle this, UV-Attack generates UV maps instead of RGB images and modifies the texture stacks. This approach enables real-time texture edits and makes the attack more practical. We also propose a novel Expectation over Pose Transformation loss (EoPT) to improve the evasion success rate on unseen poses and views. Our experiments show that UV-Attack achieves a 92.7% attack success rate against the FastRCNN model across varied poses in dynamic video settings, significantly outperforming the state-of-the-art AdvCamou attack, which only had a 28.5% ASR. Moreover, we achieve 49.5% ASR on the latest YOLOv8 detector in black-box settings. This work highlights the potential of dynamic NeRF-based UV mapping for creating more effective adversarial attacks on person detectors, addressing key challenges in modeling human movement and texture modification. The code is available at https://github.com/PolyLiYJ/UV-Attack.
Paper Structure (35 sections, 6 equations, 21 figures, 5 tables, 1 algorithm)

This paper contains 35 sections, 6 equations, 21 figures, 5 tables, 1 algorithm.

Figures (21)

  • Figure 2: The pipeline of UV-Attack. UV-Attack first samples random pose parameters from a Gaussian Mixture Model. We then use the sampled pose, camera, and light parameters to generate IUV maps and texture stacks. We employ a pre-trained stable diffusion model and modify the initial latent to generate adversarial patches. Finally, we modify the texture stacks using the adversarial patch and get the human image in RGB space. The solid lines below indicate the direction of gradient propagation, while the dashed lines above do not require gradients.
  • Figure 3: Physical attack success rates with different detectors and confidence thresholds. The attack is trained on the FastRCNN model, and the IoU threshold is set as 0.5.
  • Figure 4: Visualization of adversarial textures and rendering results for different body parts.
  • Figure 5: The impact of classifier-free guidance on the mAP (lower is better). The adversarial patch is trained on the FastRCNN model. The IoU threshold is set as 0.5. It is shown that without the guidance (the blue line), the mAP drops more significantly.
  • Figure 6: Compare with normal images and adversarial images in digital attacks.
  • ...and 16 more figures