Table of Contents
Fetching ...

Towards Robust Time-of-Flight Depth Denoising with Confidence-Aware Diffusion Model

Changyong He, Jin Zeng, Jiawei Zhang, Jiajie Guo

TL;DR

To address severe ToF depth noise, the paper introduces DepthCAD, a confidence-aware diffusion framework that denoises raw ToF correlation measurements rather than depth maps. It adapts pretrained Stable Diffusion 2.1 to the ToF domain by applying diffusion to normalized raw correlations with a dynamic range normalization to bridge domain gaps, and it injects a gradient-based confidence map to balance global structure with local fidelity. The two-component architecture—Raw Data Diffusion Module and Confidence Guidance Module—together achieve global structural smoothness while preserving metric accuracy, outperforming state-of-the-art methods on synthetic FLAT data and real Kinect v2 measurements. The approach demonstrates strong generalization to real-world ToF noise and provides a practical, robust solution for high-quality ToF depth in challenging lighting and distance conditions, with code available for reproducibility.

Abstract

Time-of-Flight (ToF) sensors efficiently capture scene depth, but the nonlinear depth construction procedure often results in extremely large noise variance or even invalid areas. Recent methods based on deep neural networks (DNNs) achieve enhanced ToF denoising accuracy but tend to struggle when presented with severe noise corruption due to limited prior knowledge of ToF data distribution. In this paper, we propose DepthCAD, a novel ToF denoising approach that ensures global structural smoothness by leveraging the rich prior knowledge in Stable Diffusion and maintains local metric accuracy by steering the diffusion process with confidence guidance. To adopt the pretrained image diffusion model to ToF depth denoising, we apply the diffusion on raw ToF correlation measurements with dynamic range normalization before converting to depth maps. Experimental results validate the state-of-the-art performance of the proposed scheme, and the evaluation on real data further verifies its robustness against real-world ToF noise.

Towards Robust Time-of-Flight Depth Denoising with Confidence-Aware Diffusion Model

TL;DR

To address severe ToF depth noise, the paper introduces DepthCAD, a confidence-aware diffusion framework that denoises raw ToF correlation measurements rather than depth maps. It adapts pretrained Stable Diffusion 2.1 to the ToF domain by applying diffusion to normalized raw correlations with a dynamic range normalization to bridge domain gaps, and it injects a gradient-based confidence map to balance global structure with local fidelity. The two-component architecture—Raw Data Diffusion Module and Confidence Guidance Module—together achieve global structural smoothness while preserving metric accuracy, outperforming state-of-the-art methods on synthetic FLAT data and real Kinect v2 measurements. The approach demonstrates strong generalization to real-world ToF noise and provides a practical, robust solution for high-quality ToF depth in challenging lighting and distance conditions, with code available for reproducibility.

Abstract

Time-of-Flight (ToF) sensors efficiently capture scene depth, but the nonlinear depth construction procedure often results in extremely large noise variance or even invalid areas. Recent methods based on deep neural networks (DNNs) achieve enhanced ToF denoising accuracy but tend to struggle when presented with severe noise corruption due to limited prior knowledge of ToF data distribution. In this paper, we propose DepthCAD, a novel ToF denoising approach that ensures global structural smoothness by leveraging the rich prior knowledge in Stable Diffusion and maintains local metric accuracy by steering the diffusion process with confidence guidance. To adopt the pretrained image diffusion model to ToF depth denoising, we apply the diffusion on raw ToF correlation measurements with dynamic range normalization before converting to depth maps. Experimental results validate the state-of-the-art performance of the proposed scheme, and the evaluation on real data further verifies its robustness against real-world ToF noise.

Paper Structure

This paper contains 14 sections, 13 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Results of ToF depth denoising with example from FLAT dataset guo2018tackling: (a) noisy input with high dynamic range of noise variance, results with (b) DNN-based RADU schelling2022radu, (c) diffusion-based Palette saharia2022palette and (d) proposed DepthCAD. As highlighted in black rectangles, RADU exhibits smooth but blurry details and Palette exhibits inaccurate depth estimation, while DepthCAD generates accurate depth with detail preservation.
  • Figure 2: Overview of the training process. We derive the Raw Data Diffusion Module from pretrained Stable Diffusion 2.1, performing diffusion on raw correlation $\tilde{{\mathbf x}}_f$ with dynamic range normalization to bridge the domain gap between ToF depth and color images, which enables the rich prior knowledge of Stable Diffusion for enhanced global structural smoothness. To balance generative quality and fidelity preservation, we design the Confidence Guidance Module with confidence ${\mathbf c}$ from gradient computation of noisy depth ${\mathbf d}$ and normalized noisy raw correlation $\tilde{{\mathbf y}}_f$ as guidance. These guidance are fused with the inputs to exert influence on the diffusion process via connections from Guidance Diffusion Model to Latent Diffusion U-Net. During training, all components in the Raw Data Diffusion Module are frozen, we train the Confidence Guidance Module by optimizing the standard diffusion objective.
  • Figure 3: Comparison of reconstruction results using the Stable Diffusion VAE with different input formats. (a) ideal depth map, reconstruction results from (b) depth measurements, (c) unnormalized raw correlations and (d) normalized raw correlations.
  • Figure 4: Depth results and error maps of ToF depth denoising on FLAT dataset guo2018tackling: (a) GT, (b) noisy depth, results of (c) ToFNet su2018deep, (d) UDA uda2019, (e) RADU schelling2022radu, (f) Palette saharia2022palette, (g) DepthGen saxena2023monocular, and (h) proposed DepthCAD. Corresponding error maps are in the second row. (i) shows the confidence map we use as guidance condition.
  • Figure 5: Visual results of ToF depth denoising on real data captured by Kinect v2 sensor: (a) infrared image, (b) noisy depth captured by Kinect v2, and results of (c) ToFNet su2018deep, (d) UDA uda2019, (e) RADU schelling2022radu, (f) Palette saharia2022palette, (g) DepthGen saxena2023monocular, and (h) proposed DepthCAD.
  • ...and 1 more figures