Table of Contents
Fetching ...

RI3D: Few-Shot Gaussian Splatting With Repair and Inpainting Diffusion Priors

Avinash Paliwal, Xilong Zhou, Wei Ye, Jinhui Xiong, Rakesh Ranjan, Nima Khademi Kalantari

TL;DR

RI3D tackles sparse-view 3D reconstruction by decoupling visible-region reconstruction from missing-region hallucination using two diffusion priors tailored to each task within a 3D Gaussian Splatting framework. It introduces a depth-aware Gaussian initialization that fuses 3D-consistent DUSt3R depth with detailed monocular depth, enabling dense, per-pixel Gaussian placement. A two-stage optimization uses a repair diffusion model to constrain visible regions and an inpainting diffusion model to hallucinate missing areas, with iterative refinement to ensure 3D coherence. On Mip-NeRF 360 and CO3D, RI3D delivers high-texture detail in occluded regions and strong perceptual quality (LPIPS) while remaining competitive in PSNR/SSIM, outperforming several baselines in challenging sparse-input scenarios.

Abstract

In this paper, we propose RI3D, a novel 3DGS-based approach that harnesses the power of diffusion models to reconstruct high-quality novel views given a sparse set of input images. Our key contribution is separating the view synthesis process into two tasks of reconstructing visible regions and hallucinating missing regions, and introducing two personalized diffusion models, each tailored to one of these tasks. Specifically, one model ('repair') takes a rendered image as input and predicts the corresponding high-quality image, which in turn is used as a pseudo ground truth image to constrain the optimization. The other model ('inpainting') primarily focuses on hallucinating details in unobserved areas. To integrate these models effectively, we introduce a two-stage optimization strategy: the first stage reconstructs visible areas using the repair model, and the second stage reconstructs missing regions with the inpainting model while ensuring coherence through further optimization. Moreover, we augment the optimization with a novel Gaussian initialization method that obtains per-image depth by combining 3D-consistent and smooth depth with highly detailed relative depth. We demonstrate that by separating the process into two tasks and addressing them with the repair and inpainting models, we produce results with detailed textures in both visible and missing regions that outperform state-of-the-art approaches on a diverse set of scenes with extremely sparse inputs.

RI3D: Few-Shot Gaussian Splatting With Repair and Inpainting Diffusion Priors

TL;DR

RI3D tackles sparse-view 3D reconstruction by decoupling visible-region reconstruction from missing-region hallucination using two diffusion priors tailored to each task within a 3D Gaussian Splatting framework. It introduces a depth-aware Gaussian initialization that fuses 3D-consistent DUSt3R depth with detailed monocular depth, enabling dense, per-pixel Gaussian placement. A two-stage optimization uses a repair diffusion model to constrain visible regions and an inpainting diffusion model to hallucinate missing areas, with iterative refinement to ensure 3D coherence. On Mip-NeRF 360 and CO3D, RI3D delivers high-texture detail in occluded regions and strong perceptual quality (LPIPS) while remaining competitive in PSNR/SSIM, outperforming several baselines in challenging sparse-input scenarios.

Abstract

In this paper, we propose RI3D, a novel 3DGS-based approach that harnesses the power of diffusion models to reconstruct high-quality novel views given a sparse set of input images. Our key contribution is separating the view synthesis process into two tasks of reconstructing visible regions and hallucinating missing regions, and introducing two personalized diffusion models, each tailored to one of these tasks. Specifically, one model ('repair') takes a rendered image as input and predicts the corresponding high-quality image, which in turn is used as a pseudo ground truth image to constrain the optimization. The other model ('inpainting') primarily focuses on hallucinating details in unobserved areas. To integrate these models effectively, we introduce a two-stage optimization strategy: the first stage reconstructs visible areas using the repair model, and the second stage reconstructs missing regions with the inpainting model while ensuring coherence through further optimization. Moreover, we augment the optimization with a novel Gaussian initialization method that obtains per-image depth by combining 3D-consistent and smooth depth with highly detailed relative depth. We demonstrate that by separating the process into two tasks and addressing them with the repair and inpainting models, we produce results with detailed textures in both visible and missing regions that outperform state-of-the-art approaches on a diverse set of scenes with extremely sparse inputs.

Paper Structure

This paper contains 26 sections, 4 equations, 12 figures, 3 tables.

Figures (12)

  • Figure 1: We introduce a novel sparse view synthesis method that employs two diffusion models, "repair" and "inpainting", which are responsible for aiding in the reconstruction of visible regions and hallucinating missing regions, respectively. Our approach involves a two-stage optimization process. In the first stage, we use the repair model to constrain the 3DGS optimization and reconstruct the regions covered by the input images. As shown, the output of the first stage properly reconstructs the visible areas, but contains missing regions, which are marked in white. In the second stage, we utilize the inpainting model to fill in these missing areas and continue optimization using the repair model to seamlessly integrate the hallucinated regions with the rest of the scene. Here, we compare our method ("Stage 2") against several state-of-the-art techniques on a 360° scene using only three input images.
  • Figure 2: We provide an overview of the different stages of our approach. First, we initialize the Gaussians by generating high-quality per-view depth maps (Sec.\ref{['ssec:init']}). Next, we fine-tune the repair and inpainting diffusion models on the scene at hand (Sec.\ref{['ssec:personalization']}). Finally, we use these models to optimize the 3DGS representation in two stages (Sec. \ref{['ssec:opt']}). In the first stage, we reconstruct the areas covered by the input images (blue), using the repair model to generate pseudo ground truth images at $M$ novel views (orange) to constrain the optimization. In the second stage, we first select a subset of novel views (green) to inpaint the missing regions (left) and continue the optimization using the repair model (right). This process of inpainting and optimization is repeated multiple times until all missing areas are reconstructed.
  • Figure 3: The depth estimated by DUSt3R is geometrically consistent in the high confidence regions (marked in yellow), but of poor quality in the remaining areas. Monocular depth is highly detailed, but is not 3D consistent. Our proposed method combines the two depth maps into a detailed and geometrically consistent depth. Applying bilateral filtering, further sharpens the boundaries.
  • Figure 4: We show the output at different stages of our approach. Our initialization strategy ensures that Gaussians from different input images are roughly aligned and cover the visible areas of the scene. During the first stage of optimization, we use the repair model to constrain the problem, which in turn helps reconstruct the visible regions with detailed texture. The missing areas are then hallucinated and seamlessly incorporated into the scene during the second and final stage of optimization.
  • Figure 5: We show the result of 3DGS optimization using our initialization. In the absence of any constraints, 3DGS optimization quickly overfits to the input images (compare rendered and input images) but produces distracting artifacts in the novel view image. Additionally, unobserved areas will not be reconstructed during optimization, resulting in a dark and blurry appearance. We address these issues using our repair and inpainting models to constrain the optimization and hallucinate missing areas.
  • ...and 7 more figures