Table of Contents
Fetching ...

HBSplat: Robust Sparse-View Gaussian Reconstruction with Hybrid-Loss Guided Depth and Bidirectional Warping

Yu Ma, Guoliang Wei, Haihong Xiao, Yue Cheng

TL;DR

HBSplat tackles sparse-view novel view synthesis by unifying 3D Gaussian Splatting with three innovations: Hybrid-Loss Depth Estimation for robust multi-view depth, Bidirectional Warping Virtual View Synthesis for expanded, high-quality supervision, and Occlusion-Aware Reconstruction to recover unseen regions. The method leverages dense matching, ray-constrained optimization, and gradient/PCC-based losses to enforce geometric and photometric consistency, achieving up to 21.13 dB PSNR and 0.189 LPIPS while maintaining real-time rendering. Extensive experiments on LLFF, Blender, DTU, and Tanks&Temples demonstrate state-of-the-art performance under extreme sparsity, with strong efficiency advantages (≈250 FPS inference) and minimal training time. This work offers a practical, scalable solution for high-fidelity 3D reconstruction from very few input views, advancing real-time NVS in challenging 360° and forward-facing scenarios.

Abstract

Novel View Synthesis (NVS) from sparse views presents a formidable challenge in 3D reconstruction, where limited multi-view constraints lead to severe overfitting, geometric distortion, and fragmented scenes. While 3D Gaussian Splatting (3DGS) delivers real-time, high-fidelity rendering, its performance drastically deteriorates under sparse inputs, plagued by floating artifacts and structural failures. To address these challenges, we introduce HBSplat, a unified framework that elevates 3DGS by seamlessly integrating robust structural cues, virtual view constraints, and occluded region completion. Our core contributions are threefold: a Hybrid-Loss Depth Estimation module that ensures multi-view consistency by leveraging dense matching priors and integrating reprojection, point propagation, and smoothness constraints; a Bidirectional Warping Virtual View Synthesis method that enforces substantially stronger constraints by creating high-fidelity virtual views through bidirectional depth-image warping and multi-view fusion; and an Occlusion-Aware Reconstruction component that recovers occluded areas using a depth-difference mask and a learning-based inpainting model. Extensive evaluations on LLFF, Blender, and DTU benchmarks validate that HBSplat sets a new state-of-the-art, achieving up to 21.13 dB PSNR and 0.189 LPIPS, while maintaining real-time inference. Code is available at: https://github.com/eternalland/HBSplat.

HBSplat: Robust Sparse-View Gaussian Reconstruction with Hybrid-Loss Guided Depth and Bidirectional Warping

TL;DR

HBSplat tackles sparse-view novel view synthesis by unifying 3D Gaussian Splatting with three innovations: Hybrid-Loss Depth Estimation for robust multi-view depth, Bidirectional Warping Virtual View Synthesis for expanded, high-quality supervision, and Occlusion-Aware Reconstruction to recover unseen regions. The method leverages dense matching, ray-constrained optimization, and gradient/PCC-based losses to enforce geometric and photometric consistency, achieving up to 21.13 dB PSNR and 0.189 LPIPS while maintaining real-time rendering. Extensive experiments on LLFF, Blender, DTU, and Tanks&Temples demonstrate state-of-the-art performance under extreme sparsity, with strong efficiency advantages (≈250 FPS inference) and minimal training time. This work offers a practical, scalable solution for high-fidelity 3D reconstruction from very few input views, advancing real-time NVS in challenging 360° and forward-facing scenarios.

Abstract

Novel View Synthesis (NVS) from sparse views presents a formidable challenge in 3D reconstruction, where limited multi-view constraints lead to severe overfitting, geometric distortion, and fragmented scenes. While 3D Gaussian Splatting (3DGS) delivers real-time, high-fidelity rendering, its performance drastically deteriorates under sparse inputs, plagued by floating artifacts and structural failures. To address these challenges, we introduce HBSplat, a unified framework that elevates 3DGS by seamlessly integrating robust structural cues, virtual view constraints, and occluded region completion. Our core contributions are threefold: a Hybrid-Loss Depth Estimation module that ensures multi-view consistency by leveraging dense matching priors and integrating reprojection, point propagation, and smoothness constraints; a Bidirectional Warping Virtual View Synthesis method that enforces substantially stronger constraints by creating high-fidelity virtual views through bidirectional depth-image warping and multi-view fusion; and an Occlusion-Aware Reconstruction component that recovers occluded areas using a depth-difference mask and a learning-based inpainting model. Extensive evaluations on LLFF, Blender, and DTU benchmarks validate that HBSplat sets a new state-of-the-art, achieving up to 21.13 dB PSNR and 0.189 LPIPS, while maintaining real-time inference. Code is available at: https://github.com/eternalland/HBSplat.

Paper Structure

This paper contains 31 sections, 29 equations, 14 figures, 8 tables, 1 algorithm.

Figures (14)

  • Figure 1: (a) MCGS outputs (left) vs. HBSplat outputs (right) for both rendered image and depth map. (b) presents an efficiency-quality scatter plot comparing HBSplat with various baseline methods.
  • Figure 2: HBSplat pipeline. First, sparse input images are processed by dense matching, structure from motion (SfM), and monocular depth estimation to obtain correspondences, camera poses, and depth maps. The Hybrid-Loss module fuses these inputs to produce robust point-wise depths. Subsequent least-squares optimization aligns the point cloud with monocular depths, recovering the metric scale. The Bidirectional Warping module leverages these aligned depths and images to synthesize novel virtual training views through depth-image warping and interpolation. Simultaneously, the Occlusion-Aware Reconstruction component restores missing background content in occluded regions using learning-based inpainting model (Simple-LAMA) guided by local foreground mask. Finally, the framework reconstructs the 3D Gaussian scene by optimizing a joint loss that combines color and depth supervision.
  • Figure 3: Hybrid-Loss Depth Estimation pipeline, which first estimates initial point depth from densely matched points using reprojection and point propagation constraints, then filters outliers. During scene training, rendered depth is refined under reprojection, point propagation, and TV smoothness constraints. Point propagation constraint are computed from the common points. Nearest-neighbor view and secondary filtering reducing redundant computation.
  • Figure 4: Visual comparison of matching pair images and depth maps. (a) and (b) respectively show the matching pair images (left, middle) and corresponding depth maps (right) for SCGaussian and HBSplat. SCGaussian relies on the RANSAC algorithm for outlier filtering, whereas HBSplat employs Outlier Filtering Mechanism. The comparison demonstrates that HBSplat more effectively removes outliers while preserving a greater number of valid points.
  • Figure 5: Bidirectional Warping Virtual View Synthesis Pipeline. The monocular depth maps from the real views are aligned with sparse depth points via least-squares optimization. Depth Warping generates virtual depth maps, filling holes. Image Warping samples real views to create virtual views. Distance scores between real and virtual views are computed, warping multiple optimal real views to a single virtual view. The nearest-neighbor virtual view serves as the base for Multi-View Fusion. The 3D Gaussian scene is reconstructed using gradient-domain loss and Pearson Correlation Coefficient (PCC) loss constraints.
  • ...and 9 more figures