Cheating Stereo Matching in Full-scale: Physical Adversarial Attack against Binocular Depth Estimation in Autonomous Driving
Kangqiao Zhao, Shuo Huai, Xurui Song, Jun Luo
TL;DR
This work addresses the vulnerability of stereo-based depth estimation in autonomous driving to physical adversarial attacks. It introduces a texture-enabled 3D camouflage PAE with a stereo-aligned rendering module to preserve disparity across left and right views, enabling a novel merging attack that blends the adversarial object into its background. The method includes a three-stage texture optimization (boundary-depth awareness, region segmentation, and region-aware depth alignment) and a stereo-consistent rendering pipeline, with an additional appearing-attack variant. Across digital simulations and real-world tests, the proposed attack consistently degrades SM-BDE outputs and causes downstream safety failures in autonomous driving stacks, underscoring a significant safety risk and highlighting the need for robust stereo perception defenses. $d_t$-driven optimization and region-specific depth priors enable physically realizable attacks that generalize across models and conditions, revealing critical implications for AD safety and defense research.
Abstract
Though deep neural models adopted to realize the perception of autonomous driving have proven vulnerable to adversarial examples, known attacks often leverage 2D patches and target mostly monocular perception. Therefore, the effectiveness of Physical Adversarial Examples (PAEs) on stereo-based binocular depth estimation remains largely unexplored. To this end, we propose the first texture-enabled physical adversarial attack against stereo matching models in the context of autonomous driving. Our method employs a 3D PAE with global camouflage texture rather than a local 2D patch-based one, ensuring both visual consistency and attack effectiveness across different viewpoints of stereo cameras. To cope with the disparity effect of these cameras, we also propose a new 3D stereo matching rendering module that allows the PAE to be aligned with real-world positions and headings in binocular vision. We further propose a novel merging attack that seamlessly blends the target into the environment through fine-grained PAE optimization. It has significantly enhanced stealth and lethality upon existing hiding attacks that fail to get seamlessly merged into the background. Extensive evaluations show that our PAEs can successfully fool the stereo models into producing erroneous depth information.
