DepthVanish: Optimizing Adversarial Interval Structures for Stereo-Depth-Invisible Patches
Yun Xing, Yue Cao, Nhat Chung, Jie Zhang, Ivor Tsang, Ming-Ming Cheng, Yang Liu, Lei Ma, Qing Guo
TL;DR
DepthVanish tackles adversarial vulnerabilities in stereo depth estimation by designing patches with grid-based interval structures that remain effective when physically realized. It introduces a joint optimization framework that co-designs texture and interval patterns, guided by losses including $L_{rMSE}$, $L_{entropy}$, and $L_{tv}$, to achieve strong digital and physical attacks across models like RAFT-Stereo and STTR as well as commercial RGB-D cameras. The approach demonstrates that regular interval spacing significantly enhances attack effectiveness and transferability, enabling a disappear-like effect where patched regions are predicted at far depths. These findings reveal practical safety risks in current stereo perception pipelines and motivate development of robust defenses and standardized evaluation for real-world deployment.
Abstract
Stereo depth estimation is a critical task in autonomous driving and robotics, where inaccuracies (such as misidentifying nearby objects as distant) can lead to dangerous situations. Adversarial attacks against stereo depth estimation can help reveal vulnerabilities before deployment. Previous works have shown that repeating optimized textures can effectively mislead stereo depth estimation in digital settings. However, our research reveals that these naively repeated textures perform poorly in physical implementations, i.e., when deployed as patches, limiting their practical utility for stress-testing stereo depth estimation systems. In this work, for the first time, we discover that introducing regular intervals among the repeated textures, creating a grid structure, significantly enhances the patch's attack performance. Through extensive experimentation, we analyze how variations of this novel structure influence the adversarial effectiveness. Based on these insights, we develop a novel stereo depth attack that jointly optimizes both the interval structure and texture elements. Our generated adversarial patches can be inserted into any scenes and successfully attack advanced stereo depth estimation methods of different paradigms, i.e., RAFT-Stereo and STTR. Most critically, our patch can also attack commercial RGB-D cameras (Intel RealSense) in real-world conditions, demonstrating their practical relevance for security assessment of stereo systems. The code is officially released at: https://github.com/WiWiN42/DepthVanish
