Table of Contents
Fetching ...

Real-World Depth Recovery via Structure Uncertainty Modeling and Inaccurate GT Depth Fitting

Delong Suzhang, Meng Yang

TL;DR

The paper tackles real-world depth recovery under limited paired raw-GT data by modeling both input and output structure uncertainties. It introduces a raw depth generation pipeline to diversify misalignments, a structure uncertainty module guided by a depth foundation model, and a robust feature alignment module to align depth with RGB structure while mitigating inaccurate GT depth effects. Through extensive experiments on RGBDD and Middlebury 2014, the approach achieves state-of-the-art accuracy and strong generalization across ToF, heavily distorted, and noisy-depth scenarios, with clear ablations validating each component. The work advances practical RGB-D depth recovery by enabling robust performance in real-world, diverse distortion conditions and demonstrates compatibility with multiple backbone architectures.

Abstract

The low-quality structure in raw depth maps is prevalent in real-world RGB-D datasets, which makes real-world depth recovery a critical task in recent years. However, the lack of paired raw-ground truth (raw-GT) data in the real world poses challenges for generalized depth recovery. Existing methods insufficiently consider the diversity of structure misalignment in raw depth maps, which leads to poor generalization in real-world depth recovery. Notably, random structure misalignments are not limited to raw depth data but also affect GT depth in real-world datasets. In the proposed method, we tackle the generalization problem from both input and output perspectives. For input, we enrich the diversity of structure misalignment in raw depth maps by designing a new raw depth generation pipeline, which helps the network avoid overfitting to a specific condition. Furthermore, a structure uncertainty module is designed to explicitly identify the misaligned structure for input raw depth maps to better generalize in unseen scenarios. Notably the well-trained depth foundation model (DFM) can help the structure uncertainty module estimate the structure uncertainty better. For output, a robust feature alignment module is designed to precisely align with the accurate structure of RGB images avoiding the interference of inaccurate GT depth. Extensive experiments on multiple datasets demonstrate the proposed method achieves competitive accuracy and generalization capabilities across various challenging raw depth maps.

Real-World Depth Recovery via Structure Uncertainty Modeling and Inaccurate GT Depth Fitting

TL;DR

The paper tackles real-world depth recovery under limited paired raw-GT data by modeling both input and output structure uncertainties. It introduces a raw depth generation pipeline to diversify misalignments, a structure uncertainty module guided by a depth foundation model, and a robust feature alignment module to align depth with RGB structure while mitigating inaccurate GT depth effects. Through extensive experiments on RGBDD and Middlebury 2014, the approach achieves state-of-the-art accuracy and strong generalization across ToF, heavily distorted, and noisy-depth scenarios, with clear ablations validating each component. The work advances practical RGB-D depth recovery by enabling robust performance in real-world, diverse distortion conditions and demonstrates compatibility with multiple backbone architectures.

Abstract

The low-quality structure in raw depth maps is prevalent in real-world RGB-D datasets, which makes real-world depth recovery a critical task in recent years. However, the lack of paired raw-ground truth (raw-GT) data in the real world poses challenges for generalized depth recovery. Existing methods insufficiently consider the diversity of structure misalignment in raw depth maps, which leads to poor generalization in real-world depth recovery. Notably, random structure misalignments are not limited to raw depth data but also affect GT depth in real-world datasets. In the proposed method, we tackle the generalization problem from both input and output perspectives. For input, we enrich the diversity of structure misalignment in raw depth maps by designing a new raw depth generation pipeline, which helps the network avoid overfitting to a specific condition. Furthermore, a structure uncertainty module is designed to explicitly identify the misaligned structure for input raw depth maps to better generalize in unseen scenarios. Notably the well-trained depth foundation model (DFM) can help the structure uncertainty module estimate the structure uncertainty better. For output, a robust feature alignment module is designed to precisely align with the accurate structure of RGB images avoiding the interference of inaccurate GT depth. Extensive experiments on multiple datasets demonstrate the proposed method achieves competitive accuracy and generalization capabilities across various challenging raw depth maps.

Paper Structure

This paper contains 19 sections, 10 equations, 6 figures, 12 tables.

Figures (6)

  • Figure 1: The effects of real-world structure misalignment in depth maps. (a) conventional depth recovery training insufficiently accounts for the diversity of structural misalignment in real-world depth maps, leading to overfitting results during generalization tests. (b) our proposed approach explicitly models diverse structural misalignment patterns across both raw and GT depth maps, enabling robust generalization to real-world scenarios.
  • Figure 2: The visual results of our raw depth generation. Beyond the conventional simulation of noisy super-resolution, we further simulate structural misalignment from GT depth maps, making the generated data more representative of real-world raw depth. (a) RGB image, (b) GT depth, (c), (d), and (e) represent different random structure misalignment simulations, (f) the real-world raw depth map captured by lightweight ToF sensorrgbdd.
  • Figure 3: The structure uncertainty estimation module of the raw depth map. A lightweight neural network (NN) is adopted to estimate the uncertainty of real-world raw depth maps with the classification solution. The error label is generated between raw depth and GT depth, with a threshold of $0.1\times max(GT)$ for binarization processing.
  • Figure 4: The overall framework of our proposed model. The input and output of our robust feature alignment module are illustrated in detail. Specifically, the structure uncertainty model leverages a relative depth estimation from a depth foundation model (DFM) dav2 to quantify the uncertainty of various raw depth maps, with weights of DFM kept fixed during training. Additionally, our feature alignment module is designed to align with the precise RGB feature mitigating the interference of inaccurate GT depth. The module is compatible with different network backbones, including U-Net wang2024g2, ViT dav2, and ConvNeXt woo2023convnext.
  • Figure 5: Visual results of generalization test on RGBDD. (a) RGB, (b) Raw, (c) DCTDCTNet, (d) G2wang2024g2, (e) SGNwang2024sgnet, (f) C2F wang2023crf, (g) Ours, (h) GT.
  • ...and 1 more figures