RI3D: Few-Shot Gaussian Splatting With Repair and Inpainting Diffusion Priors
Avinash Paliwal, Xilong Zhou, Wei Ye, Jinhui Xiong, Rakesh Ranjan, Nima Khademi Kalantari
TL;DR
RI3D tackles sparse-view 3D reconstruction by decoupling visible-region reconstruction from missing-region hallucination using two diffusion priors tailored to each task within a 3D Gaussian Splatting framework. It introduces a depth-aware Gaussian initialization that fuses 3D-consistent DUSt3R depth with detailed monocular depth, enabling dense, per-pixel Gaussian placement. A two-stage optimization uses a repair diffusion model to constrain visible regions and an inpainting diffusion model to hallucinate missing areas, with iterative refinement to ensure 3D coherence. On Mip-NeRF 360 and CO3D, RI3D delivers high-texture detail in occluded regions and strong perceptual quality (LPIPS) while remaining competitive in PSNR/SSIM, outperforming several baselines in challenging sparse-input scenarios.
Abstract
In this paper, we propose RI3D, a novel 3DGS-based approach that harnesses the power of diffusion models to reconstruct high-quality novel views given a sparse set of input images. Our key contribution is separating the view synthesis process into two tasks of reconstructing visible regions and hallucinating missing regions, and introducing two personalized diffusion models, each tailored to one of these tasks. Specifically, one model ('repair') takes a rendered image as input and predicts the corresponding high-quality image, which in turn is used as a pseudo ground truth image to constrain the optimization. The other model ('inpainting') primarily focuses on hallucinating details in unobserved areas. To integrate these models effectively, we introduce a two-stage optimization strategy: the first stage reconstructs visible areas using the repair model, and the second stage reconstructs missing regions with the inpainting model while ensuring coherence through further optimization. Moreover, we augment the optimization with a novel Gaussian initialization method that obtains per-image depth by combining 3D-consistent and smooth depth with highly detailed relative depth. We demonstrate that by separating the process into two tasks and addressing them with the repair and inpainting models, we produce results with detailed textures in both visible and missing regions that outperform state-of-the-art approaches on a diverse set of scenes with extremely sparse inputs.
