Table of Contents
Fetching ...

Physical Adversarial Attack on Monocular Depth Estimation via Shape-Varying Patches

Chenxing Zhao, Yang Li, Shihao Wu, Wenyi Tan, Shuangju Zhou, Quan Pan

TL;DR

This work addresses the vulnerability of monocular depth estimation to physical adversarial patches by introducing the ASP framework, which optimizes patch content, shape, and position using differentiable quadrilateral and circular masks. A novel depth-focused loss encourages depth changes to propagate beyond the patch, enabling manipulation of entire target objects in the depth map. The approach demonstrates superior attack efficacy across self-supervised MDE models and persists under several common defenses, with successful physical-world demonstrations and comprehensive ablations. The results underscore significant safety implications for autonomous systems relying on monocular depth cues and call for robust defense mechanisms against shape-aware patch attacks.

Abstract

Adversarial attacks against monocular depth estimation (MDE) systems pose significant challenges, particularly in safety-critical applications such as autonomous driving. Existing patch-based adversarial attacks for MDE are confined to the vicinity of the patch, making it difficult to affect the entire target. To address this limitation, we propose a physics-based adversarial attack on monocular depth estimation, employing a framework called Attack with Shape-Varying Patches (ASP), aiming to optimize patch content, shape, and position to maximize effectiveness. We introduce various mask shapes, including quadrilateral, rectangular, and circular masks, to enhance the flexibility and efficiency of the attack. Furthermore, we propose a new loss function to extend the influence of the patch beyond the overlapping regions. Experimental results demonstrate that our attack method generates an average depth error of 18 meters on the target car with a patch area of 1/9, affecting over 98\% of the target area.

Physical Adversarial Attack on Monocular Depth Estimation via Shape-Varying Patches

TL;DR

This work addresses the vulnerability of monocular depth estimation to physical adversarial patches by introducing the ASP framework, which optimizes patch content, shape, and position using differentiable quadrilateral and circular masks. A novel depth-focused loss encourages depth changes to propagate beyond the patch, enabling manipulation of entire target objects in the depth map. The approach demonstrates superior attack efficacy across self-supervised MDE models and persists under several common defenses, with successful physical-world demonstrations and comprehensive ablations. The results underscore significant safety implications for autonomous systems relying on monocular depth cues and call for robust defense mechanisms against shape-aware patch attacks.

Abstract

Adversarial attacks against monocular depth estimation (MDE) systems pose significant challenges, particularly in safety-critical applications such as autonomous driving. Existing patch-based adversarial attacks for MDE are confined to the vicinity of the patch, making it difficult to affect the entire target. To address this limitation, we propose a physics-based adversarial attack on monocular depth estimation, employing a framework called Attack with Shape-Varying Patches (ASP), aiming to optimize patch content, shape, and position to maximize effectiveness. We introduce various mask shapes, including quadrilateral, rectangular, and circular masks, to enhance the flexibility and efficiency of the attack. Furthermore, we propose a new loss function to extend the influence of the patch beyond the overlapping regions. Experimental results demonstrate that our attack method generates an average depth error of 18 meters on the target car with a patch area of 1/9, affecting over 98\% of the target area.
Paper Structure (26 sections, 21 equations, 10 figures, 2 tables)

This paper contains 26 sections, 21 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: The image in (a) depicts benign content, wherein the mean depth of the car is measured at 6.77 meters; The patch shown in (b) was directly obtained from the source code provided in Distribution Model (DisM for short) yamanaka2020adversarial. This patch introduces a mean depth error of 1.45 meters to the car; The patch displayed in (c) was trained using the source code provided in StylePatch Model cheng2022physical. This patch results in a mean depth error of 7.19 meters affecting the car; (d) is the patch trained to maximize the distance of the car, and the mean depth error reaches 19.17 meters; In (e), our objective was to make the car appear to vanish within the depth map, rather than the depth map simply missing a portion of its content; In (f), we aimed to bring the car closer. As a result, the mean depth of the car is reduced to 5.05 meters.
  • Figure 2: Overview of the framework. The inputs is in the blue box, which includes the Benign Object ($O$), Object Mask ($m_O$), and Random Scene ($RS$). The Patch ($p$) and Patch Mask ($m_p$) in the yellow box represent the content we aim to optimize. Please refer to section \ref{['sec:overview']} for detailed information regarding the framework diagram.
  • Figure 3: Arbitrary Quadrilateral Mask. Wherein, $h$ and $w$ respectively represent the height and width of the mask, and ${\Theta_1} = [l, r, t, b]$ are the boundary parameters we set.
  • Figure 4: Circle Mask. (a) Circular Mask without Binary; (b) Circular Mask with Binary; (c) Oval Mask
  • Figure 5: Patch Robustness. Benign Error: Error caused by the defense in benign cases. Attack Error: Error caused by our attack.
  • ...and 5 more figures