MVIP-NeRF: Multi-view 3D Inpainting on NeRF Scenes via Diffusion Prior
Honghua Chen, Chen Change Loy, Xingang Pan
TL;DR
MVIP-NeRF addresses the challenge of inpainting NeRF scenes with view-consistent appearance and geometry by leveraging diffusion priors in a joint RGB and normal-map optimization framework. It introduces appearance and geometry diffusion priors within the SDS paradigm and a multi-view SDS score to stabilize completion under large view changes, along with a smoothed normal-based geometry representation. The method achieves improved appearance realism (LPIPS) and geometry coherence over state-of-the-art NeRF inpainting methods, demonstrated on Real-S and Real-L datasets through extensive ablations and analyses of normal vs depth guidance and single- vs multi-view distillation. This diffusion-prior–driven approach reduces dependency on explicit per-view inpainting but incurs higher computational cost and requires careful CFG/temporal-scheduling tuning, highlighting practical trade-offs for 3D content restoration in NeRF scenes.
Abstract
Despite the emergence of successful NeRF inpainting methods built upon explicit RGB and depth 2D inpainting supervisions, these methods are inherently constrained by the capabilities of their underlying 2D inpainters. This is due to two key reasons: (i) independently inpainting constituent images results in view-inconsistent imagery, and (ii) 2D inpainters struggle to ensure high-quality geometry completion and alignment with inpainted RGB images. To overcome these limitations, we propose a novel approach called MVIP-NeRF that harnesses the potential of diffusion priors for NeRF inpainting, addressing both appearance and geometry aspects. MVIP-NeRF performs joint inpainting across multiple views to reach a consistent solution, which is achieved via an iterative optimization process based on Score Distillation Sampling (SDS). Apart from recovering the rendered RGB images, we also extract normal maps as a geometric representation and define a normal SDS loss that motivates accurate geometry inpainting and alignment with the appearance. Additionally, we formulate a multi-view SDS score function to distill generative priors simultaneously from different view images, ensuring consistent visual completion when dealing with large view variations. Our experimental results show better appearance and geometry recovery than previous NeRF inpainting methods.
