Table of Contents
Fetching ...

Improving Geometry in Sparse-View 3DGS via Reprojection-based DoF Separation

Yongsung Kim, Minjun Park, Jooyoung Choi, Sungroh Yoon

TL;DR

This work targets geometry artifacts that arise when refining sparse-view 3D geometry with 3D Gaussian Splatting (3DGS). It introduces a reprojection-based DoF separation that partitions positional DoFs into image-plane-parallel and ray-aligned types, applying a bounded offset for the former and a visibility loss for the latter to leverage per-view depth priors from learning-based MVS. The method preserves depth information while suppressing texture-driven geometric distortions, and it demonstrates improvements in geometry plausibility (via $PDC$) with competitive rendering quality across Mip-NeRF 360, MVImgNet, and Tanks and Temples. Ablation studies confirm the importance of the bounded offset and visibility loss, and limitations are discussed in the context of the underlying MVS pose-estimation and specular-surface handling. Overall, the approach advances sparse-view 3DGS by prioritizing geometric plausibility alongside photometric fidelity, suggesting a shift toward geometry-focused metrics beyond $PSNR$ in future work.

Abstract

Recent learning-based Multi-View Stereo models have demonstrated state-of-the-art performance in sparse-view 3D reconstruction. However, directly applying 3D Gaussian Splatting (3DGS) as a refinement step following these models presents challenges. We hypothesize that the excessive positional degrees of freedom (DoFs) in Gaussians induce geometry distortion, fitting color patterns at the cost of structural fidelity. To address this, we propose reprojection-based DoF separation, a method distinguishing positional DoFs in terms of uncertainty: image-plane-parallel DoFs and ray-aligned DoF. To independently manage each DoF, we introduce a reprojection process along with tailored constraints for each DoF. Through experiments across various datasets, we confirm that separating the positional DoFs of Gaussians and applying targeted constraints effectively suppresses geometric artifacts, producing reconstruction results that are both visually and geometrically plausible.

Improving Geometry in Sparse-View 3DGS via Reprojection-based DoF Separation

TL;DR

This work targets geometry artifacts that arise when refining sparse-view 3D geometry with 3D Gaussian Splatting (3DGS). It introduces a reprojection-based DoF separation that partitions positional DoFs into image-plane-parallel and ray-aligned types, applying a bounded offset for the former and a visibility loss for the latter to leverage per-view depth priors from learning-based MVS. The method preserves depth information while suppressing texture-driven geometric distortions, and it demonstrates improvements in geometry plausibility (via ) with competitive rendering quality across Mip-NeRF 360, MVImgNet, and Tanks and Temples. Ablation studies confirm the importance of the bounded offset and visibility loss, and limitations are discussed in the context of the underlying MVS pose-estimation and specular-surface handling. Overall, the approach advances sparse-view 3DGS by prioritizing geometric plausibility alongside photometric fidelity, suggesting a shift toward geometry-focused metrics beyond in future work.

Abstract

Recent learning-based Multi-View Stereo models have demonstrated state-of-the-art performance in sparse-view 3D reconstruction. However, directly applying 3D Gaussian Splatting (3DGS) as a refinement step following these models presents challenges. We hypothesize that the excessive positional degrees of freedom (DoFs) in Gaussians induce geometry distortion, fitting color patterns at the cost of structural fidelity. To address this, we propose reprojection-based DoF separation, a method distinguishing positional DoFs in terms of uncertainty: image-plane-parallel DoFs and ray-aligned DoF. To independently manage each DoF, we introduce a reprojection process along with tailored constraints for each DoF. Through experiments across various datasets, we confirm that separating the positional DoFs of Gaussians and applying targeted constraints effectively suppresses geometric artifacts, producing reconstruction results that are both visually and geometrically plausible.

Paper Structure

This paper contains 22 sections, 8 figures, 3 tables.

Figures (8)

  • Figure 1: 3D reconstruction results with 12 views. The image on the left side is the rendered depth image, and the upper right image is the rendered RGB image. The lower right image visualizes patch-wise depth correlation, where green indicates accurate geometry. As the color shifts from green to gray and then to purple, the patch-wise depth correlation decreases, indicating less plausible geometry. Our method qualitatively demonstrates more uniform and realistic geometry, which is also evident from the higher patch-wise depth correlation.
  • Figure 2: Geometric artifacts from naive 3DGS refinement. Texture representation via Gaussian positions introduces unintended geometric patterns. The red box highlights excessive distortion in the ceiling geometry, and the blue box shows gaps in the flat floor geometry following texture patterns.
  • Figure 3: Overview of the proposed framework. (a) Scene initialization using a learning-based MVS model, which predicts 3D points from images and outputs per-view depth as an intermediate representation. (b) Naïve implementation, where MVS is treated as a black-box model, and its output is refined using the 3DGS pipeline. (c) Our proposed framework, which introduces reprojection-based refinement by retaining intermediate per-view depth as a trainable target. A visibility loss function is used to resolve conflicts when aligning individual per-view depths into a shared coordinate system.
  • Figure 4: Qualitative comparison on Tanks and Temples. Our method outperforms the baselines not only in reconstructing smooth surfaces, such as the floor, but also in capturing the geometry of complex shapes like statues. As the color shifts from green to gray and then to purple, the patch-wise depth correlation decreases, indicating less plausible geometry.
  • Figure 5: Qualitative comparison on Mip-NeRF 360. In the Counter scene, the baselines represent the patterns of the tray and tablecloth as geometric artifacts, whereas our method more plausibly captures the geometry. Additionally, in the Bonsai scene, the baselines produce numerous floaters near the piano and bicycle, while our method represents the geometry without floaters.
  • ...and 3 more figures