UV-Attack: Physical-World Adversarial Attacks for Person Detection via Dynamic-NeRF-based UV Mapping
Yanjie Li, Kaisheng Liang, Bin Xiao
TL;DR
UV-Attack introduces a novel physical adversarial attack against person detectors by leveraging dynamic NeRF-based UV mapping to edit 3D clothing textures in real time. By sampling realistic unseen poses with a Gaussian Mixture Model and optimizing a Pose-Transformation aware loss (EoPT), the method preserves attack efficacy across varied actions and viewpoints. The approach uses UV-volume textures, diffusion-based adversarial patches, and TPS-based texture distortions to achieve high attack success rates in both digital and physical settings, outperforming state-of-the-art baselines and showing strong transferability across detectors. This work highlights a practical, scalable route to adversarial textures for non-rigid 3D objects and suggests broader implications for robustness of person-detection systems in dynamic real-world scenarios.
Abstract
In recent research, adversarial attacks on person detectors using patches or static 3D model-based texture modifications have struggled with low success rates due to the flexible nature of human movement. Modeling the 3D deformations caused by various actions has been a major challenge. Fortunately, advancements in Neural Radiance Fields (NeRF) for dynamic human modeling offer new possibilities. In this paper, we introduce UV-Attack, a groundbreaking approach that achieves high success rates even with extensive and unseen human actions. We address the challenge above by leveraging dynamic-NeRF-based UV mapping. UV-Attack can generate human images across diverse actions and viewpoints, and even create novel actions by sampling from the SMPL parameter space. While dynamic NeRF models are capable of modeling human bodies, modifying clothing textures is challenging because they are embedded in neural network parameters. To tackle this, UV-Attack generates UV maps instead of RGB images and modifies the texture stacks. This approach enables real-time texture edits and makes the attack more practical. We also propose a novel Expectation over Pose Transformation loss (EoPT) to improve the evasion success rate on unseen poses and views. Our experiments show that UV-Attack achieves a 92.7% attack success rate against the FastRCNN model across varied poses in dynamic video settings, significantly outperforming the state-of-the-art AdvCamou attack, which only had a 28.5% ASR. Moreover, we achieve 49.5% ASR on the latest YOLOv8 detector in black-box settings. This work highlights the potential of dynamic NeRF-based UV mapping for creating more effective adversarial attacks on person detectors, addressing key challenges in modeling human movement and texture modification. The code is available at https://github.com/PolyLiYJ/UV-Attack.
