Table of Contents
Fetching ...

3D Gaussian Inpainting with Depth-Guided Cross-View Consistency

Sheng-Yu Huang, Zi-Ting Chou, Yu-Chiang Frank Wang

TL;DR

The paper addresses the challenge of multi-view consistent 3D inpainting for scenes represented by 3D Gaussian Splatting and related neural rendering methods. It introduces 3DGIC, a two-stage framework that first infers depth-guided inpainting masks across multiple views and then refines the 3DGS with cross-view supervision derived from a reference-view inpainting. The key contributions are the depth-guided mask inference, the inpainting-guided 3DGS refinement with cross-view losses, and strong empirical results on SPIn-NeRF and additional 3D scenes demonstrating improved fidelity and consistency. The approach enables reliable, editable 3D scenes for practical VR/AR applications by achieving higher-quality, cross-view coherent inpainting than prior methods.

Abstract

When performing 3D inpainting using novel-view rendering methods like Neural Radiance Field (NeRF) or 3D Gaussian Splatting (3DGS), how to achieve texture and geometry consistency across camera views has been a challenge. In this paper, we propose a framework of 3D Gaussian Inpainting with Depth-Guided Cross-View Consistency (3DGIC) for cross-view consistent 3D inpainting. Guided by the rendered depth information from each training view, our 3DGIC exploits background pixels visible across different views for updating the inpainting mask, allowing us to refine the 3DGS for inpainting purposes.Through extensive experiments on benchmark datasets, we confirm that our 3DGIC outperforms current state-of-the-art 3D inpainting methods quantitatively and qualitatively.

3D Gaussian Inpainting with Depth-Guided Cross-View Consistency

TL;DR

The paper addresses the challenge of multi-view consistent 3D inpainting for scenes represented by 3D Gaussian Splatting and related neural rendering methods. It introduces 3DGIC, a two-stage framework that first infers depth-guided inpainting masks across multiple views and then refines the 3DGS with cross-view supervision derived from a reference-view inpainting. The key contributions are the depth-guided mask inference, the inpainting-guided 3DGS refinement with cross-view losses, and strong empirical results on SPIn-NeRF and additional 3D scenes demonstrating improved fidelity and consistency. The approach enables reliable, editable 3D scenes for practical VR/AR applications by achieving higher-quality, cross-view coherent inpainting than prior methods.

Abstract

When performing 3D inpainting using novel-view rendering methods like Neural Radiance Field (NeRF) or 3D Gaussian Splatting (3DGS), how to achieve texture and geometry consistency across camera views has been a challenge. In this paper, we propose a framework of 3D Gaussian Inpainting with Depth-Guided Cross-View Consistency (3DGIC) for cross-view consistent 3D inpainting. Guided by the rendered depth information from each training view, our 3DGIC exploits background pixels visible across different views for updating the inpainting mask, allowing us to refine the 3DGS for inpainting purposes.Through extensive experiments on benchmark datasets, we confirm that our 3DGIC outperforms current state-of-the-art 3D inpainting methods quantitatively and qualitatively.

Paper Structure

This paper contains 27 sections, 10 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Overview of 3D Gaussian Inpainting with Depth-Guided Cross-View Consistency. Given a 3D Gaussian Splatting model $G_{1:N}$ pretrained on multi-view images $I_{1:K}$ at camera poses $\xi_{1:K}$, our goal is to perform 3D inpainting based on the object masks $M_{1:K}$ (e.g., provided by SAM). With the rendered depth maps $D_{1:K}$, the stage of Inferring Depth-Guide Inpainting Mask is able to refine the inpainting masks to preserve visible backgrounds across camera views. The stage of Inpainting-guided 3DGS Refinement then utilizes such masks to jointly update the new Gaussians $G'_{1:N'}$ for both novel-view rendering and inpainting purposes.
  • Figure 2: Inferring Depth-Guided Inpainting Mask. Taking $\{I_1, M_1\}$ at view $\xi_1$ as an example reference view, the original background region $I^B_1$ can be first produced. We then project the background region $I^B_2$ from $\xi_2$ to $\xi_1$, updating ${I'}^B_1$ and the associated inpainting mask $M'_1$. By repeating this process across camera views, the final inpainting mask $M'_1$ contains only the regions that are not visible at any training camera views.
  • Figure 3: Qualitative results on the SPIn-NeRF mirzaei2023spin dataset. Two different views of the same scene are shown for each inpainting example. We compare rendering results against MVIP-NeRF chen2024mvip, MALD-NeRF lin2024maldnerf, and GScream wang2024gscream. We can see from the regions highlighted by the red boxes that our 3DGIC performs better in terms of multi-view consistency and rendering fidelity
  • Figure 4: Qualitative results on the Figurines scene from the LeRF kerr2023lerf dataset. We compare the rendering results with SPIn-NeRF mirzaei2023spin, Gaussian Grouping ye2023gaussiangrouping, and GScream wang2024gscream. The three rows show different views of the scene, whereas the first column shows the input images with the object masks of the unwanted object. The regions highlighted by the red boxes show that our 3DGIC inpaints a smoother table surface without artifacts.
  • Figure 5: Qualitative results on the Counter scene from the MipNeRF360 barron2022mipnerf360 dataset. We compare the rendering results with SPIn-NeRF mirzaei2023spin, Gaussian Grouping ye2023gaussiangrouping, and GScream wang2024gscream. The three rows show different views of the scene, where we zoom in a certain region in the first row to highlight the difference between each method. We can see from the regions highlighted by the red boxes that our 3DGIC correctly inpaints the water bottle without manipulating any other objects on the table (e.g., the plastic cover).
  • ...and 2 more figures