Table of Contents
Fetching ...

How to Use Diffusion Priors under Sparse Views?

Qisen Wang, Yifan Zhao, Jiawei Ma, Jia Li

TL;DR

The paper addresses sparse-view novel view synthesis by diagnosing SDS with diffusion priors as prone to mode deviation. It introduces Inline Prior Guided Score Matching (IPSM), which uses inline priors derived from viewpoint geometry to rectify the rendered image distribution and decomposes the SDS objective into two KL-based subproblems, enabling diffusion supervision without fine-tuning. The IPSM-Gaussian pipeline couples IPSM with 3D Gaussian Splatting, adding depth and geometry regularization to reinforce inline priors and distribution rectification. Experimental results on LLFF and DTU show state-of-the-art reconstruction quality, demonstrating IPSM's effectiveness in leveraging diffusion priors under sparse views. The work provides a practical framework for diffusion-based priors in 3D reconstruction, with code released for reproducibility.

Abstract

Novel view synthesis under sparse views has been a long-term important challenge in 3D reconstruction. Existing works mainly rely on introducing external semantic or depth priors to supervise the optimization of 3D representations. However, the diffusion model, as an external prior that can directly provide visual supervision, has always underperformed in sparse-view 3D reconstruction using Score Distillation Sampling (SDS) due to the low information entropy of sparse views compared to text, leading to optimization challenges caused by mode deviation. To this end, we present a thorough analysis of SDS from the mode-seeking perspective and propose Inline Prior Guided Score Matching (IPSM), which leverages visual inline priors provided by pose relationships between viewpoints to rectify the rendered image distribution and decomposes the original optimization objective of SDS, thereby offering effective diffusion visual guidance without any fine-tuning or pre-training. Furthermore, we propose the IPSM-Gaussian pipeline, which adopts 3D Gaussian Splatting as the backbone and supplements depth and geometry consistency regularization based on IPSM to further improve inline priors and rectified distribution. Experimental results on different public datasets show that our method achieves state-of-the-art reconstruction quality. The code is released at https://github.com/iCVTEAM/IPSM.

How to Use Diffusion Priors under Sparse Views?

TL;DR

The paper addresses sparse-view novel view synthesis by diagnosing SDS with diffusion priors as prone to mode deviation. It introduces Inline Prior Guided Score Matching (IPSM), which uses inline priors derived from viewpoint geometry to rectify the rendered image distribution and decomposes the SDS objective into two KL-based subproblems, enabling diffusion supervision without fine-tuning. The IPSM-Gaussian pipeline couples IPSM with 3D Gaussian Splatting, adding depth and geometry regularization to reinforce inline priors and distribution rectification. Experimental results on LLFF and DTU show state-of-the-art reconstruction quality, demonstrating IPSM's effectiveness in leveraging diffusion priors under sparse views. The work provides a practical framework for diffusion-based priors in 3D reconstruction, with code released for reproducibility.

Abstract

Novel view synthesis under sparse views has been a long-term important challenge in 3D reconstruction. Existing works mainly rely on introducing external semantic or depth priors to supervise the optimization of 3D representations. However, the diffusion model, as an external prior that can directly provide visual supervision, has always underperformed in sparse-view 3D reconstruction using Score Distillation Sampling (SDS) due to the low information entropy of sparse views compared to text, leading to optimization challenges caused by mode deviation. To this end, we present a thorough analysis of SDS from the mode-seeking perspective and propose Inline Prior Guided Score Matching (IPSM), which leverages visual inline priors provided by pose relationships between viewpoints to rectify the rendered image distribution and decomposes the original optimization objective of SDS, thereby offering effective diffusion visual guidance without any fine-tuning or pre-training. Furthermore, we propose the IPSM-Gaussian pipeline, which adopts 3D Gaussian Splatting as the backbone and supplements depth and geometry consistency regularization based on IPSM to further improve inline priors and rectified distribution. Experimental results on different public datasets show that our method achieves state-of-the-art reconstruction quality. The code is released at https://github.com/iCVTEAM/IPSM.

Paper Structure

This paper contains 27 sections, 17 equations, 10 figures, 15 tables.

Figures (10)

  • Figure 1: Dilemma of SDS. Average PSNR$\uparrow$, SSIM$\uparrow$, and LPIPS$\downarrow$ of each iteration on the LLFF test dataset llff with Base (without SDS), SDS (CFG=7.5), and SDS (CFG=100). The prior-added period starts from the 2K iteration and ends at the 9.5K iteration. The opacity is also reset at 2K. The details and final training results of SDS are shown in Sec. \ref{['sec:comp_sds']}.
  • Figure 2: Comparison of SDS and IPSM.Left: Tending to seek nearest mode, causing mode deviation. Right: Rectifying distribution to seek the target mode.
  • Figure 3: IPSM-Gaussian obtains the inline prior within sparse views through inversely warping seen views to unseen pseudo views, thus modifying the rendered image distribution to the rectified distribution. Consequently taking the rectified distribution as the intermediate state, two sub-optimization objectives are utilized for controlling the optimization direction.
  • Figure 4: Qualitative comparison on the LLFF dataset.
  • Figure 5: Qualitative comparison on DTU.
  • ...and 5 more figures