Table of Contents
Fetching ...

Cheating Stereo Matching in Full-scale: Physical Adversarial Attack against Binocular Depth Estimation in Autonomous Driving

Kangqiao Zhao, Shuo Huai, Xurui Song, Jun Luo

TL;DR

This work addresses the vulnerability of stereo-based depth estimation in autonomous driving to physical adversarial attacks. It introduces a texture-enabled 3D camouflage PAE with a stereo-aligned rendering module to preserve disparity across left and right views, enabling a novel merging attack that blends the adversarial object into its background. The method includes a three-stage texture optimization (boundary-depth awareness, region segmentation, and region-aware depth alignment) and a stereo-consistent rendering pipeline, with an additional appearing-attack variant. Across digital simulations and real-world tests, the proposed attack consistently degrades SM-BDE outputs and causes downstream safety failures in autonomous driving stacks, underscoring a significant safety risk and highlighting the need for robust stereo perception defenses. $d_t$-driven optimization and region-specific depth priors enable physically realizable attacks that generalize across models and conditions, revealing critical implications for AD safety and defense research.

Abstract

Though deep neural models adopted to realize the perception of autonomous driving have proven vulnerable to adversarial examples, known attacks often leverage 2D patches and target mostly monocular perception. Therefore, the effectiveness of Physical Adversarial Examples (PAEs) on stereo-based binocular depth estimation remains largely unexplored. To this end, we propose the first texture-enabled physical adversarial attack against stereo matching models in the context of autonomous driving. Our method employs a 3D PAE with global camouflage texture rather than a local 2D patch-based one, ensuring both visual consistency and attack effectiveness across different viewpoints of stereo cameras. To cope with the disparity effect of these cameras, we also propose a new 3D stereo matching rendering module that allows the PAE to be aligned with real-world positions and headings in binocular vision. We further propose a novel merging attack that seamlessly blends the target into the environment through fine-grained PAE optimization. It has significantly enhanced stealth and lethality upon existing hiding attacks that fail to get seamlessly merged into the background. Extensive evaluations show that our PAEs can successfully fool the stereo models into producing erroneous depth information.

Cheating Stereo Matching in Full-scale: Physical Adversarial Attack against Binocular Depth Estimation in Autonomous Driving

TL;DR

This work addresses the vulnerability of stereo-based depth estimation in autonomous driving to physical adversarial attacks. It introduces a texture-enabled 3D camouflage PAE with a stereo-aligned rendering module to preserve disparity across left and right views, enabling a novel merging attack that blends the adversarial object into its background. The method includes a three-stage texture optimization (boundary-depth awareness, region segmentation, and region-aware depth alignment) and a stereo-consistent rendering pipeline, with an additional appearing-attack variant. Across digital simulations and real-world tests, the proposed attack consistently degrades SM-BDE outputs and causes downstream safety failures in autonomous driving stacks, underscoring a significant safety risk and highlighting the need for robust stereo perception defenses. -driven optimization and region-specific depth priors enable physically realizable attacks that generalize across models and conditions, revealing critical implications for AD safety and defense research.

Abstract

Though deep neural models adopted to realize the perception of autonomous driving have proven vulnerable to adversarial examples, known attacks often leverage 2D patches and target mostly monocular perception. Therefore, the effectiveness of Physical Adversarial Examples (PAEs) on stereo-based binocular depth estimation remains largely unexplored. To this end, we propose the first texture-enabled physical adversarial attack against stereo matching models in the context of autonomous driving. Our method employs a 3D PAE with global camouflage texture rather than a local 2D patch-based one, ensuring both visual consistency and attack effectiveness across different viewpoints of stereo cameras. To cope with the disparity effect of these cameras, we also propose a new 3D stereo matching rendering module that allows the PAE to be aligned with real-world positions and headings in binocular vision. We further propose a novel merging attack that seamlessly blends the target into the environment through fine-grained PAE optimization. It has significantly enhanced stealth and lethality upon existing hiding attacks that fail to get seamlessly merged into the background. Extensive evaluations show that our PAEs can successfully fool the stereo models into producing erroneous depth information.

Paper Structure

This paper contains 25 sections, 12 equations, 12 figures, 7 tables.

Figures (12)

  • Figure 1: Comparison between existing PAEs and our 3D texture-enabled merging attack PAEs in CARLA simulator dosovitskiy2017carla. (a) Benign right and left camera images with the corresponding depth map; (b) Random texture; (c), (d) 2D adversarial patches from liu2024physical and cheng2021towards, respectively, applied to 3D vehicle. (e) MDE Adversarial texture from zheng2024physical; (f) Our 3D merging attack adversarial texture.
  • Figure 2: Overview of our attack against SM-BDE models. The adversarial camouflage texture $\theta$ is mapped on the PAE and synthesized with the backgrounds ($b_l$ and $b_r$) using a differentiable renderer $R$ with rendering configurations ($k_l$ and $k_r$). The optimization of $\theta$ is driven by backpropagation, guided by a loss function tailored to the objectives of the merging attack.
  • Figure 3: Depth results of boundary and each segmentation.
  • Figure 4: Visualization of our 3D PAEs in the real world. Top: Benign. Bottom: Adversarial.
  • Figure 5: Our 3D texture-PAEs against SM-BDE under different viewpoint variations.
  • ...and 7 more figures