Table of Contents
Fetching ...

High-Precision Self-Supervised Monocular Depth Estimation with Rich-Resource Prior

Wencheng Han, Jianbing Shen

TL;DR

This work tackles the practicality gap in self-supervised monocular depth estimation by introducing Rich-resource Prior Depth (RPrDepth), which uses offline rich-resource priors to guide a low-resolution single-image depth estimator. The method employs a two-branch training pipeline with a ref-dataset of rich-resource data, a Prior Depth Fusion Module to fuse prior information, a Rich-resource Guided Loss to exploit pseudo-label guidance and viewpoint consistency, and an Attention Guided Feature Selection strategy to dramatically reduce the reference-data search space. Empirically, RPrDepth achieves state-of-the-art or competitive performance on KITTI Eigen Split, Make3D, and Cityscapes, outperforming strong baselines that rely on rich-resource inputs during inference while using only LR single-image inputs at test time. The approach improves robustness to moving objects and texture ambiguities by leveraging structured priors, making high-accuracy depth estimation more practical for real-world deployment.

Abstract

In the area of self-supervised monocular depth estimation, models that utilize rich-resource inputs, such as high-resolution and multi-frame inputs, typically achieve better performance than models that use ordinary single image input. However, these rich-resource inputs may not always be available, limiting the applicability of these methods in general scenarios. In this paper, we propose Rich-resource Prior Depth estimator (RPrDepth), which only requires single input image during the inference phase but can still produce highly accurate depth estimations comparable to rich resource based methods. Specifically, we treat rich-resource data as prior information and extract features from it as reference features in an offline manner. When estimating the depth for a single-image image, we search for similar pixels from the rich-resource features and use them as prior information to estimate the depth. Experimental results demonstrate that our model outperform other single-image model and can achieve comparable or even better performance than models with rich-resource inputs, only using low-resolution single-image input.

High-Precision Self-Supervised Monocular Depth Estimation with Rich-Resource Prior

TL;DR

This work tackles the practicality gap in self-supervised monocular depth estimation by introducing Rich-resource Prior Depth (RPrDepth), which uses offline rich-resource priors to guide a low-resolution single-image depth estimator. The method employs a two-branch training pipeline with a ref-dataset of rich-resource data, a Prior Depth Fusion Module to fuse prior information, a Rich-resource Guided Loss to exploit pseudo-label guidance and viewpoint consistency, and an Attention Guided Feature Selection strategy to dramatically reduce the reference-data search space. Empirically, RPrDepth achieves state-of-the-art or competitive performance on KITTI Eigen Split, Make3D, and Cityscapes, outperforming strong baselines that rely on rich-resource inputs during inference while using only LR single-image inputs at test time. The approach improves robustness to moving objects and texture ambiguities by leveraging structured priors, making high-accuracy depth estimation more practical for real-world deployment.

Abstract

In the area of self-supervised monocular depth estimation, models that utilize rich-resource inputs, such as high-resolution and multi-frame inputs, typically achieve better performance than models that use ordinary single image input. However, these rich-resource inputs may not always be available, limiting the applicability of these methods in general scenarios. In this paper, we propose Rich-resource Prior Depth estimator (RPrDepth), which only requires single input image during the inference phase but can still produce highly accurate depth estimations comparable to rich resource based methods. Specifically, we treat rich-resource data as prior information and extract features from it as reference features in an offline manner. When estimating the depth for a single-image image, we search for similar pixels from the rich-resource features and use them as prior information to estimate the depth. Experimental results demonstrate that our model outperform other single-image model and can achieve comparable or even better performance than models with rich-resource inputs, only using low-resolution single-image input.
Paper Structure (14 sections, 13 equations, 5 figures, 4 tables)

This paper contains 14 sections, 13 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Our main motivation. In self-supervised monocular depth estimation, models using rich-resource inputs generally achieve better performance. We aim to extract prior data from rich-resource inputs during offline training, using it to enhance models with single images.
  • Figure 2: Illustration of the Training Phase of Our Pipeline. Our pipeline comprises two branches: rich-resource and LR single-image. The former generates precise depth maps and features from rich-resource images, while the latter leverages these features to achieve comparable performance.
  • Figure 3: Illustration of the Prior Depth Fusion Module.
  • Figure 4: Illustration of the Loss and Inference Pipeline. (a) Illustration of the Rich-resource guided loss. (b) Illustration of Attention Guided Feature Selection. (c) The Inference Pipeline of RPrDepth.
  • Figure 5: Qualitative results on the KITTI Eigen split test set. Our RPrDepth can correct the errors of both LR single-image models and rich-resource based models.