Table of Contents
Fetching ...

DepthVanish: Optimizing Adversarial Interval Structures for Stereo-Depth-Invisible Patches

Yun Xing, Yue Cao, Nhat Chung, Jie Zhang, Ivor Tsang, Ming-Ming Cheng, Yang Liu, Lei Ma, Qing Guo

TL;DR

DepthVanish tackles adversarial vulnerabilities in stereo depth estimation by designing patches with grid-based interval structures that remain effective when physically realized. It introduces a joint optimization framework that co-designs texture and interval patterns, guided by losses including $L_{rMSE}$, $L_{entropy}$, and $L_{tv}$, to achieve strong digital and physical attacks across models like RAFT-Stereo and STTR as well as commercial RGB-D cameras. The approach demonstrates that regular interval spacing significantly enhances attack effectiveness and transferability, enabling a disappear-like effect where patched regions are predicted at far depths. These findings reveal practical safety risks in current stereo perception pipelines and motivate development of robust defenses and standardized evaluation for real-world deployment.

Abstract

Stereo depth estimation is a critical task in autonomous driving and robotics, where inaccuracies (such as misidentifying nearby objects as distant) can lead to dangerous situations. Adversarial attacks against stereo depth estimation can help reveal vulnerabilities before deployment. Previous works have shown that repeating optimized textures can effectively mislead stereo depth estimation in digital settings. However, our research reveals that these naively repeated textures perform poorly in physical implementations, i.e., when deployed as patches, limiting their practical utility for stress-testing stereo depth estimation systems. In this work, for the first time, we discover that introducing regular intervals among the repeated textures, creating a grid structure, significantly enhances the patch's attack performance. Through extensive experimentation, we analyze how variations of this novel structure influence the adversarial effectiveness. Based on these insights, we develop a novel stereo depth attack that jointly optimizes both the interval structure and texture elements. Our generated adversarial patches can be inserted into any scenes and successfully attack advanced stereo depth estimation methods of different paradigms, i.e., RAFT-Stereo and STTR. Most critically, our patch can also attack commercial RGB-D cameras (Intel RealSense) in real-world conditions, demonstrating their practical relevance for security assessment of stereo systems. The code is officially released at: https://github.com/WiWiN42/DepthVanish

DepthVanish: Optimizing Adversarial Interval Structures for Stereo-Depth-Invisible Patches

TL;DR

DepthVanish tackles adversarial vulnerabilities in stereo depth estimation by designing patches with grid-based interval structures that remain effective when physically realized. It introduces a joint optimization framework that co-designs texture and interval patterns, guided by losses including , , and , to achieve strong digital and physical attacks across models like RAFT-Stereo and STTR as well as commercial RGB-D cameras. The approach demonstrates that regular interval spacing significantly enhances attack effectiveness and transferability, enabling a disappear-like effect where patched regions are predicted at far depths. These findings reveal practical safety risks in current stereo perception pipelines and motivate development of robust defenses and standardized evaluation for real-world deployment.

Abstract

Stereo depth estimation is a critical task in autonomous driving and robotics, where inaccuracies (such as misidentifying nearby objects as distant) can lead to dangerous situations. Adversarial attacks against stereo depth estimation can help reveal vulnerabilities before deployment. Previous works have shown that repeating optimized textures can effectively mislead stereo depth estimation in digital settings. However, our research reveals that these naively repeated textures perform poorly in physical implementations, i.e., when deployed as patches, limiting their practical utility for stress-testing stereo depth estimation systems. In this work, for the first time, we discover that introducing regular intervals among the repeated textures, creating a grid structure, significantly enhances the patch's attack performance. Through extensive experimentation, we analyze how variations of this novel structure influence the adversarial effectiveness. Based on these insights, we develop a novel stereo depth attack that jointly optimizes both the interval structure and texture elements. Our generated adversarial patches can be inserted into any scenes and successfully attack advanced stereo depth estimation methods of different paradigms, i.e., RAFT-Stereo and STTR. Most critically, our patch can also attack commercial RGB-D cameras (Intel RealSense) in real-world conditions, demonstrating their practical relevance for security assessment of stereo systems. The code is officially released at: https://github.com/WiWiN42/DepthVanish

Paper Structure

This paper contains 36 sections, 10 equations, 14 figures, 1 table.

Figures (14)

  • Figure 1: Baseline (Stereoscopic berger2022stereoscopic) vs. our DepthVanish on attacking RAFT-Stereo lipson2021raft and Intel RealSense.
  • Figure 1: Illustration of i3DStreoid.
  • Figure 2: Adversarial effect of interval spacing on depth prediction. (a) Mean predicted depth (solid lines) and variance (shaded regions) for different interval spacing strategies, averaged over interval widths of $2-10~px$. The gray dashed band indicates $\pm1.5~m$ from the ground truth. (b) Visualization of depth prediction results for typical different interval spaced patches where the ground truth depth is $7~m$.
  • Figure 2: Attack performance of the our DepthVanish patch with different patch physical size and depth.
  • Figure 3: RAFT-Stereo depth prediction performance under various interval structures and patch rotation degrees. (a) Illustration of rotation around the X and Y axes. (b) Depth prediction performance at different rotation degrees around X axis. (c) Depth prediction performance at different rotation degrees around Y axis.
  • ...and 9 more figures