APARATE: Adaptive Adversarial Patch for CNN-based Monocular Depth Estimation for Autonomous Navigation
Amira Guesmi, Muhammad Abdullah Hanif, Ihsen Alouani, Muhammad Shafique
TL;DR
This work tackles the security risk of monocular depth estimation (MDE) in autonomous navigation by introducing APARATE, an adaptive physical adversarial patch that is sensitive to object shape, size, and distance. APARATE uses two masks and a patch transformer to expand the patch’s influence beyond the overlapped region, enabling complete concealment or depth distortion of targeted objects. A penalized depth loss, decomposed into overlapped and non-overlapped components and augmented with non-printability and total-variation terms, drives a gradient-based optimization of the patch, while an automated object-detection step localizes targets for training. Experiments on KITTI with three MDE models show substantial depth errors and near-complete region disruption, outperforming prior patches and maintaining impact under common defenses. The results highlight the need for robust defenses in real-world autonomous systems and provide a framework for evaluating adversarial patches in depth perception tasks.
Abstract
In recent times, monocular depth estimation (MDE) has experienced significant advancements in performance, largely attributed to the integration of innovative architectures, i.e., convolutional neural networks (CNNs) and Transformers. Nevertheless, the susceptibility of these models to adversarial attacks has emerged as a noteworthy concern, especially in domains where safety and security are paramount. This concern holds particular weight for MDE due to its critical role in applications like autonomous driving and robotic navigation, where accurate scene understanding is pivotal. To assess the vulnerability of CNN-based depth prediction methods, recent work tries to design adversarial patches against MDE. However, the existing approaches fall short of inducing a comprehensive and substantially disruptive impact on the vision system. Instead, their influence is partial and confined to specific local areas. These methods lead to erroneous depth predictions only within the overlapping region with the input image, without considering the characteristics of the target object, such as its size, shape, and position. In this paper, we introduce a novel adversarial patch named APARATE. This patch possesses the ability to selectively undermine MDE in two distinct ways: by distorting the estimated distances or by creating the illusion of an object disappearing from the perspective of the autonomous system. Notably, APARATE is designed to be sensitive to the shape and scale of the target object, and its influence extends beyond immediate proximity. APARATE, results in a mean depth estimation error surpassing $0.5$, significantly impacting as much as $99\%$ of the targeted region when applied to CNN-based MDE models. Furthermore, it yields a significant error of $0.34$ and exerts substantial influence over $94\%$ of the target region in the context of Transformer-based MDE.
