Table of Contents
Fetching ...

Learning Fine-Grained Geometry for Sparse-View Splatting via Cascade Depth Loss

Wenjun Lu, Haodong Chen, Anqi Yi, Yuk Ying Chung, Zhiyong Wang, Kun Hu

TL;DR

This work tackles sparse-view novel view synthesis by introducing Hierarchical Depth-Guided Sparse-View 3D Gaussian Splatting (HDGS), which refines geometry from global to local detail using multi-scale monocular depth cues. A Cascade Pearson Correlation Loss (CPCL) enforces scale-aware, structure-preserving depth alignment across patches of varying sizes, while a linear kernel reduces smoothing and preserves high-frequency geometry. The approach is complemented by a dense initialization from VGGSfM and careful depth normalization (local and global) to stabilize training. Experiments on LLFF and DTU show state-of-the-art performance under sparse views with efficient rendering, demonstrating improved geometric fidelity and visual quality over both NeRF- and 3DGS-based baselines.

Abstract

Novel view synthesis is a fundamental task in 3D computer vision that aims to reconstruct realistic images from a set of posed input views. However, reconstruction quality degrades significantly under sparse-view conditions due to limited geometric cues. Existing methods, such as Neural Radiance Fields (NeRF) and the more recent 3D Gaussian Splatting (3DGS), often suffer from blurred details and structural artifacts when trained with insufficient views. Recent works have identified the quality of rendered depth as a key factor in mitigating these artifacts, as it directly affects geometric accuracy and view consistency. In this paper, we address these challenges by introducing Hierarchical Depth-Guided Splatting (HDGS), a depth supervision framework that progressively refines geometry from coarse to fine levels. Central to HDGS is a novel Cascade Pearson Correlation Loss (CPCL), which aligns rendered and estimated monocular depths across multiple spatial scales. By enforcing multi-scale depth consistency, our method substantially improves structural fidelity in sparse-view scenarios. Extensive experiments on the LLFF and DTU benchmarks demonstrate that HDGS achieves state-of-the-art performance under sparse-view settings while maintaining efficient and high-quality rendering

Learning Fine-Grained Geometry for Sparse-View Splatting via Cascade Depth Loss

TL;DR

This work tackles sparse-view novel view synthesis by introducing Hierarchical Depth-Guided Sparse-View 3D Gaussian Splatting (HDGS), which refines geometry from global to local detail using multi-scale monocular depth cues. A Cascade Pearson Correlation Loss (CPCL) enforces scale-aware, structure-preserving depth alignment across patches of varying sizes, while a linear kernel reduces smoothing and preserves high-frequency geometry. The approach is complemented by a dense initialization from VGGSfM and careful depth normalization (local and global) to stabilize training. Experiments on LLFF and DTU show state-of-the-art performance under sparse views with efficient rendering, demonstrating improved geometric fidelity and visual quality over both NeRF- and 3DGS-based baselines.

Abstract

Novel view synthesis is a fundamental task in 3D computer vision that aims to reconstruct realistic images from a set of posed input views. However, reconstruction quality degrades significantly under sparse-view conditions due to limited geometric cues. Existing methods, such as Neural Radiance Fields (NeRF) and the more recent 3D Gaussian Splatting (3DGS), often suffer from blurred details and structural artifacts when trained with insufficient views. Recent works have identified the quality of rendered depth as a key factor in mitigating these artifacts, as it directly affects geometric accuracy and view consistency. In this paper, we address these challenges by introducing Hierarchical Depth-Guided Splatting (HDGS), a depth supervision framework that progressively refines geometry from coarse to fine levels. Central to HDGS is a novel Cascade Pearson Correlation Loss (CPCL), which aligns rendered and estimated monocular depths across multiple spatial scales. By enforcing multi-scale depth consistency, our method substantially improves structural fidelity in sparse-view scenarios. Extensive experiments on the LLFF and DTU benchmarks demonstrate that HDGS achieves state-of-the-art performance under sparse-view settings while maintaining efficient and high-quality rendering

Paper Structure

This paper contains 27 sections, 24 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Overview of our framework: Given sparse input views, we initialize 3D Gaussians using a dense point cloud and supervise geometry via hierarchical alignment between rendered and monocular depth maps. A cascade Pearson correlation loss is applied across multi-scale patches and normalization modes, enabling accurate reconstruction under sparse-view conditions.
  • Figure 2: Qualitative comparison of rendered RGB images and depth maps on the LLFF dataset using 3 input views
  • Figure 3: Qualitative comparison of rendered RGB images and depth maps on the DTU dataset using 3 input views