Table of Contents
Fetching ...

OBJVanish: Physically Realizable Text-to-3D Adv. Generation of LiDAR-Invisible Objects

Bing Li, Wuqi Wang, Yanan Zhang, Jingzheng Li, Haigen Min, Wei Feng, Xingyu Zhao, Jie Zhang, Qing Guo

TL;DR

This work investigates the vulnerability of LiDAR-based 3D detectors to physically realizable adversarial content generated via text prompts. It introduces Phy3DAdvGen, which optimizes discrete verb–object–pose prompts fed into a text-to-3D generator with differentiable LiDAR rendering to produce LiDAR-invisible pedestrians and human–object configurations, constrained by a real-object pool for physical realization. Through CARLA-based empirical analysis and real-world tests, the method demonstrates high attack success across multiple detectors, highlighting that combinations of objects and prompt semantics substantially increase susceptibility, and revealing practical gaps in current defense strategies. The results underscore the need for robust, multi-sensor perception and prompt-aware defenses to mitigate generative, physically realizable threats in safety-critical environments.

Abstract

LiDAR-based 3D object detectors are fundamental to autonomous driving, where failing to detect objects poses severe safety risks. Developing effective 3D adversarial attacks is essential for thoroughly testing these detection systems and exposing their vulnerabilities before real-world deployment. However, existing adversarial attacks that add optimized perturbations to 3D points have two critical limitations: they rarely cause complete object disappearance and prove difficult to implement in physical environments. We introduce the text-to-3D adversarial generation method, a novel approach enabling physically realizable attacks that can generate 3D models of objects truly invisible to LiDAR detectors and be easily realized in the real world. Specifically, we present the first empirical study that systematically investigates the factors influencing detection vulnerability by manipulating the topology, connectivity, and intensity of individual pedestrian 3D models and combining pedestrians with multiple objects within the CARLA simulation environment. Building on the insights, we propose the physically-informed text-to-3D adversarial generation (Phy3DAdvGen) that systematically optimizes text prompts by iteratively refining verbs, objects, and poses to produce LiDAR-invisible pedestrians. To ensure physical realizability, we construct a comprehensive object pool containing 13 3D models of real objects and constrain Phy3DAdvGen to generate 3D objects based on combinations of objects in this set. Extensive experiments demonstrate that our approach can generate 3D pedestrians that evade six state-of-the-art (SOTA) LiDAR 3D detectors in both CARLA simulation and physical environments, thereby highlighting vulnerabilities in safety-critical applications.

OBJVanish: Physically Realizable Text-to-3D Adv. Generation of LiDAR-Invisible Objects

TL;DR

This work investigates the vulnerability of LiDAR-based 3D detectors to physically realizable adversarial content generated via text prompts. It introduces Phy3DAdvGen, which optimizes discrete verb–object–pose prompts fed into a text-to-3D generator with differentiable LiDAR rendering to produce LiDAR-invisible pedestrians and human–object configurations, constrained by a real-object pool for physical realization. Through CARLA-based empirical analysis and real-world tests, the method demonstrates high attack success across multiple detectors, highlighting that combinations of objects and prompt semantics substantially increase susceptibility, and revealing practical gaps in current defense strategies. The results underscore the need for robust, multi-sensor perception and prompt-aware defenses to mitigate generative, physically realizable threats in safety-critical environments.

Abstract

LiDAR-based 3D object detectors are fundamental to autonomous driving, where failing to detect objects poses severe safety risks. Developing effective 3D adversarial attacks is essential for thoroughly testing these detection systems and exposing their vulnerabilities before real-world deployment. However, existing adversarial attacks that add optimized perturbations to 3D points have two critical limitations: they rarely cause complete object disappearance and prove difficult to implement in physical environments. We introduce the text-to-3D adversarial generation method, a novel approach enabling physically realizable attacks that can generate 3D models of objects truly invisible to LiDAR detectors and be easily realized in the real world. Specifically, we present the first empirical study that systematically investigates the factors influencing detection vulnerability by manipulating the topology, connectivity, and intensity of individual pedestrian 3D models and combining pedestrians with multiple objects within the CARLA simulation environment. Building on the insights, we propose the physically-informed text-to-3D adversarial generation (Phy3DAdvGen) that systematically optimizes text prompts by iteratively refining verbs, objects, and poses to produce LiDAR-invisible pedestrians. To ensure physical realizability, we construct a comprehensive object pool containing 13 3D models of real objects and constrain Phy3DAdvGen to generate 3D objects based on combinations of objects in this set. Extensive experiments demonstrate that our approach can generate 3D pedestrians that evade six state-of-the-art (SOTA) LiDAR 3D detectors in both CARLA simulation and physical environments, thereby highlighting vulnerabilities in safety-critical applications.

Paper Structure

This paper contains 18 sections, 8 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Example of Phy3DAdvGen for digital and physical deployment. Our method generates physically realizable adversarial 3D objects from discrete text prompts that successfully evade LiDAR detection (achieving "Object Vanished" status). To enable physical realization, we construct a prepared object pool and input the corresponding 3D models into our method. Phy3DAdvGen combines these prepared objects to generate adversarial 3D objects that are physically realizable without requiring 3D printing to obtain the optimized object in real-world scenarios.
  • Figure 2: An overview of our empirical analysis pipeline for studying LiDAR invisibility factors. We manipulate the object attributes of 3D pedestrian models or human-object compositions, insert them into diverse CARLA scenes to collect data (middle), and finally evaluate their detectability using LiDAR-based 3D detectors (right).
  • Figure 3: Effects of environmental factors on 3D adversarial object detectability.
  • Figure 4: Overview of the Phy3DAdvGen framework: Prompt triplets (verb, object, pose) are optimized end-to-end by backpropagating an adversarial loss from a downstream detector. These prompts guide a text-to-3D model to generate human-object compositions, which are rendered into LiDAR scenes using differentiable rendering. Successful adversarial configurations are physically aligned with real LiDAR sensors for real-world validation.
  • Figure 5: Multi-view images of the 3D adversarial object generated by Phy3DAdvGen, with prompts below.
  • ...and 6 more figures