Table of Contents
Fetching ...

Improving Domain Generalization in Self-supervised Monocular Depth Estimation via Stabilized Adversarial Training

Yuanqi Yao, Gang Wu, Kui Jiang, Siao Liu, Jian Kuai, Xianming Liu, Junjun Jiang

TL;DR

This paper proposes a general adversarial training framework, named Stabilized Conflict-optimization Adversarial Training (SCAT), integrating adversarial data augmentation into self-supervised MDE methods to achieve a balance between stability and generalization.

Abstract

Learning a self-supervised Monocular Depth Estimation (MDE) model with great generalization remains significantly challenging. Despite the success of adversarial augmentation in the supervised learning generalization, naively incorporating it into self-supervised MDE models potentially causes over-regularization, suffering from severe performance degradation. In this paper, we conduct qualitative analysis and illuminate the main causes: (i) inherent sensitivity in the UNet-alike depth network and (ii) dual optimization conflict caused by over-regularization. To tackle these issues, we propose a general adversarial training framework, named Stabilized Conflict-optimization Adversarial Training (SCAT), integrating adversarial data augmentation into self-supervised MDE methods to achieve a balance between stability and generalization. Specifically, we devise an effective scaling depth network that tunes the coefficients of long skip connection and effectively stabilizes the training process. Then, we propose a conflict gradient surgery strategy, which progressively integrates the adversarial gradient and optimizes the model toward a conflict-free direction. Extensive experiments on five benchmarks demonstrate that SCAT can achieve state-of-the-art performance and significantly improve the generalization capability of existing self-supervised MDE methods.

Improving Domain Generalization in Self-supervised Monocular Depth Estimation via Stabilized Adversarial Training

TL;DR

This paper proposes a general adversarial training framework, named Stabilized Conflict-optimization Adversarial Training (SCAT), integrating adversarial data augmentation into self-supervised MDE methods to achieve a balance between stability and generalization.

Abstract

Learning a self-supervised Monocular Depth Estimation (MDE) model with great generalization remains significantly challenging. Despite the success of adversarial augmentation in the supervised learning generalization, naively incorporating it into self-supervised MDE models potentially causes over-regularization, suffering from severe performance degradation. In this paper, we conduct qualitative analysis and illuminate the main causes: (i) inherent sensitivity in the UNet-alike depth network and (ii) dual optimization conflict caused by over-regularization. To tackle these issues, we propose a general adversarial training framework, named Stabilized Conflict-optimization Adversarial Training (SCAT), integrating adversarial data augmentation into self-supervised MDE methods to achieve a balance between stability and generalization. Specifically, we devise an effective scaling depth network that tunes the coefficients of long skip connection and effectively stabilizes the training process. Then, we propose a conflict gradient surgery strategy, which progressively integrates the adversarial gradient and optimizes the model toward a conflict-free direction. Extensive experiments on five benchmarks demonstrate that SCAT can achieve state-of-the-art performance and significantly improve the generalization capability of existing self-supervised MDE methods.

Paper Structure

This paper contains 22 sections, 13 equations, 8 figures, 5 tables, 1 algorithm.

Figures (8)

  • Figure 1: Visualization of the domain generalization over various methods.(a) Comparisons with offline scenario-specific data augmentation methods.(b) Comparisons with vanilla Gaussian noise and vanilla adversarial data augmentation.(c) Comparisons of different LSCs scaling factors. The results show that our SCAT has excellent generalization performance under multiple unseen domains and retains the performance on the original training set.
  • Figure 2: Overview of our SCAT architecture. Our method introduces an adversarial noise generator, which is optimized through $\mathcal{L}_{AD}$ and acts as an adversarial constraint, while $\mathcal{L}_p$ imposes self-supervised constraints to optimize self-supervised MDE model. CGS incrementally applies adversarial augmenters from multiple iterations, enabling a balanced achievement in model generalization and stability. Meanwhile, we utilize a scaling depth network(SDN) to stabilize the training process.
  • Figure 3: (a) Statistics of gradient cosine similarity. Through CGS, we have shifted the distribution of cosine similarities from being negatively skewed to positively skewed, mitigating the previously prevalent issue of adversely oriented gradients. (b) Illustration of training oscillation issue arising from Dual Optimization Conflict.
  • Figure 4: Example images with different $\epsilon_m$ of adversarial perturbation.
  • Figure 5: Qualitative Results for KITTI-C. As the SOTA self-supervised MDE methods, MonoDepth and MonoVit excel on the KITTI dataset, but struggle to accurately infer depth information from various types of damaged images in out-of-distribution (OoD) domains. With our SCAT framework, their depth estimation performance in challenging cross-domain scenarios can be substantially improved.
  • ...and 3 more figures