Table of Contents
Fetching ...

EC-Depth: Exploring the consistency of self-supervised monocular depth estimation in challenging scenes

Ziyang Song, Ruijie Zhu, Chuxin Wang, Jiacheng Deng, Jianfeng He, Tianzhu Zhang

TL;DR

EC-Depth tackles the brittleness of self-supervised monocular depth estimation in adverse conditions by a two-stage framework: stage one uses perturbation-invariant depth consistency to propagate supervision from standard to challenging scenes, and stage two applies Mean Teacher distillation with a consistency-based pseudo-label filtering strategy to refine supervision. The approach yields strong improvements on KITTI-C and zero-shot generalization to DrivingStereo and NuScenes-Night, while preserving standard KITTI performance. Key contributions include the perturbation-invariant depth consistency loss, a two-perturbation image triplet design with feature-level perturbations, and a dual geometric/depth-consistency filtering scheme for reliable pseudo-labels. Overall, EC-Depth advances robust, consistent depth estimation under challenging real-world conditions and is network-architecture agnostic, enabling broad applicability.

Abstract

Self-supervised monocular depth estimation holds significant importance in the fields of autonomous driving and robotics. However, existing methods are typically trained and tested on standard datasets, overlooking the impact of various adverse conditions prevalent in real-world applications, such as rainy days. As a result, it is commonly observed that these methods struggle to handle these challenging scenarios. To address this issue, we present EC-Depth, a novel self-supervised two-stage framework to achieve a robust depth estimation. In the first stage, we propose depth consistency regularization to propagate reliable supervision from standard to challenging scenes. In the second stage, we adopt the Mean Teacher paradigm and propose a novel consistency-based pseudo-label filtering strategy to improve the quality of pseudo-labels, further improving both the accuracy and robustness of our model. Extensive experiments demonstrate that our method achieves accurate and consistent depth predictions in both standard and challenging scenarios, surpassing existing state-of-the-art methods on KITTI, KITTI-C, DrivingStereo, and NuScenes-Night benchmarks.

EC-Depth: Exploring the consistency of self-supervised monocular depth estimation in challenging scenes

TL;DR

EC-Depth tackles the brittleness of self-supervised monocular depth estimation in adverse conditions by a two-stage framework: stage one uses perturbation-invariant depth consistency to propagate supervision from standard to challenging scenes, and stage two applies Mean Teacher distillation with a consistency-based pseudo-label filtering strategy to refine supervision. The approach yields strong improvements on KITTI-C and zero-shot generalization to DrivingStereo and NuScenes-Night, while preserving standard KITTI performance. Key contributions include the perturbation-invariant depth consistency loss, a two-perturbation image triplet design with feature-level perturbations, and a dual geometric/depth-consistency filtering scheme for reliable pseudo-labels. Overall, EC-Depth advances robust, consistent depth estimation under challenging real-world conditions and is network-architecture agnostic, enabling broad applicability.

Abstract

Self-supervised monocular depth estimation holds significant importance in the fields of autonomous driving and robotics. However, existing methods are typically trained and tested on standard datasets, overlooking the impact of various adverse conditions prevalent in real-world applications, such as rainy days. As a result, it is commonly observed that these methods struggle to handle these challenging scenarios. To address this issue, we present EC-Depth, a novel self-supervised two-stage framework to achieve a robust depth estimation. In the first stage, we propose depth consistency regularization to propagate reliable supervision from standard to challenging scenes. In the second stage, we adopt the Mean Teacher paradigm and propose a novel consistency-based pseudo-label filtering strategy to improve the quality of pseudo-labels, further improving both the accuracy and robustness of our model. Extensive experiments demonstrate that our method achieves accurate and consistent depth predictions in both standard and challenging scenarios, surpassing existing state-of-the-art methods on KITTI, KITTI-C, DrivingStereo, and NuScenes-Night benchmarks.
Paper Structure (16 sections, 13 equations, 6 figures, 8 tables)

This paper contains 16 sections, 13 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1:
  • Figure 2:
  • Figure 4: The first-stage training framework of EC-Depth. In the first stage, we train DepthNet and PoseNet with the perturbation-invariant depth consistency loss.
  • Figure 5: The second-stage training framework of EC-Depth. In the second stage, we leverage the Mean Teacher paradigm to generate pseudo-labels for self-distillation. In particular, we propose a depth consistency-based filter (DC-Filter) and a geometric consistency-based filter (GC-Filter) to filter out unreliable pseudo-labels.
  • Figure 6: Qualitive results on KITTI and KITTI-C benchmark. Our method can predict accurate and consistent depth maps even under severe perturbations.
  • ...and 1 more figures