Table of Contents
Fetching ...

CDGS: Confidence-Aware Depth Regularization for 3D Gaussian Splatting

Qilin Zhang, Olaf Wysocki, Steffen Urban, Boris Jutzi

TL;DR

This work addresses the limited geometric accuracy of 3D Gaussian Splatting (3DGS) by introducing CDGS, a confidence-aware depth regularization framework. CDGS comprises two components: depth refinement and alignment to produce geometry-consistent depths from monocular estimates and sparse SfM data, and confidence-aware depth regularization that adaptively weights depth supervision using multi-cue confidence maps and an alignment loss $l_a$. On Tanks and Temples, CDGS achieves stable convergence, improved early-stage geometric details, and competitive novel-view synthesis quality, with M3C2 geometry errors consistently lowered and PSNR gains up to $2.31$ dB reported. The approach enables more accurate and efficient 3D reconstruction suitable for real-world applications like digital twins, heritage preservation, and forestry, while acknowledging challenges in indoor, complex lighting scenarios and suggesting avenues for enhanced confidence estimation and additional geometric priors.

Abstract

3D Gaussian Splatting (3DGS) has shown significant advantages in novel view synthesis (NVS), particularly in achieving high rendering speeds and high-quality results. However, its geometric accuracy in 3D reconstruction remains limited due to the lack of explicit geometric constraints during optimization. This paper introduces CDGS, a confidence-aware depth regularization approach developed to enhance 3DGS. We leverage multi-cue confidence maps of monocular depth estimation and sparse Structure-from-Motion depth to adaptively adjust depth supervision during the optimization process. Our method demonstrates improved geometric detail preservation in early training stages and achieves competitive performance in both NVS quality and geometric accuracy. Experiments on the publicly available Tanks and Temples benchmark dataset show that our method achieves more stable convergence behavior and more accurate geometric reconstruction results, with improvements of up to 2.31 dB in PSNR for NVS and consistently lower geometric errors in M3C2 distance metrics. Notably, our method reaches comparable F-scores to the original 3DGS with only 50% of the training iterations. We expect this work will facilitate the development of efficient and accurate 3D reconstruction systems for real-world applications such as digital twin creation, heritage preservation, or forestry applications.

CDGS: Confidence-Aware Depth Regularization for 3D Gaussian Splatting

TL;DR

This work addresses the limited geometric accuracy of 3D Gaussian Splatting (3DGS) by introducing CDGS, a confidence-aware depth regularization framework. CDGS comprises two components: depth refinement and alignment to produce geometry-consistent depths from monocular estimates and sparse SfM data, and confidence-aware depth regularization that adaptively weights depth supervision using multi-cue confidence maps and an alignment loss . On Tanks and Temples, CDGS achieves stable convergence, improved early-stage geometric details, and competitive novel-view synthesis quality, with M3C2 geometry errors consistently lowered and PSNR gains up to dB reported. The approach enables more accurate and efficient 3D reconstruction suitable for real-world applications like digital twins, heritage preservation, and forestry, while acknowledging challenges in indoor, complex lighting scenarios and suggesting avenues for enhanced confidence estimation and additional geometric priors.

Abstract

3D Gaussian Splatting (3DGS) has shown significant advantages in novel view synthesis (NVS), particularly in achieving high rendering speeds and high-quality results. However, its geometric accuracy in 3D reconstruction remains limited due to the lack of explicit geometric constraints during optimization. This paper introduces CDGS, a confidence-aware depth regularization approach developed to enhance 3DGS. We leverage multi-cue confidence maps of monocular depth estimation and sparse Structure-from-Motion depth to adaptively adjust depth supervision during the optimization process. Our method demonstrates improved geometric detail preservation in early training stages and achieves competitive performance in both NVS quality and geometric accuracy. Experiments on the publicly available Tanks and Temples benchmark dataset show that our method achieves more stable convergence behavior and more accurate geometric reconstruction results, with improvements of up to 2.31 dB in PSNR for NVS and consistently lower geometric errors in M3C2 distance metrics. Notably, our method reaches comparable F-scores to the original 3DGS with only 50% of the training iterations. We expect this work will facilitate the development of efficient and accurate 3D reconstruction systems for real-world applications such as digital twin creation, heritage preservation, or forestry applications.

Paper Structure

This paper contains 11 sections, 13 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Overview of our confidence-aware depth regularization framework for . Our method introduces three key components: i) depth refinement and alignment, ii) confidence-aware depth regularization through multi-cue feature analysis, and iii) comprehensive 2D and 3D evaluation metrics for assessing both rendering quality and geometric accuracy. This framework enables stable optimization and improved reconstruction results.
  • Figure 2: Qualitative comparison of results on the Ignatius (top two rows) and Truck (bottom two rows) scenes at iteration 9,000. For each scene: rows 1&3 show synthesized RGB images, rows 2&4 present corresponding depth maps. Reference depth maps are generated using Depth Anything V2, and comparison depth maps are rendered from respective 3D representations. Applying our method required an additional preprocessing time of 1.5 seconds per image on average, ensuring its uniform applicability across all inputs. Yellow boxes highlight regions where our method better preserves geometric and radiometric details.
  • Figure 3: Comparison of composite image loss during training across different scenes from the dataset. Each plot shows the convergence behavior over 30,000 iterations, with our method (blue) and (red). Lower values indicate better performance.
  • Figure 4: F-score evolution during training for Barn, Caterpillar, and Meeting Room scenes from the TnT dataset. Our method (CDGS, in blue), DRGS (in green), and 3DGS (in red). The y-axis range is set to 0-5%.
  • Figure 5: M3C2 distance analysis visualization of our CDGS reconstruction on the Caterpillar scene. The color bar indicates the signed distances to the ground truth surface, where green represents small distances ($\pm$0.05 m), blue indicates negative deviations (up to -0.4 m), and red shows positive deviations (up to 0.4 m).