Generative Object Insertion in Gaussian Splatting with a Multi-View Diffusion Model
Hongliang Zhong, Can Wang, Jingbo Zhang, Jing Liao
TL;DR
The paper tackles the challenge of inserting new objects into 3D scenes represented by Gaussian Splatting with view-consistent quality. It introduces MVInpainter, a multi-view diffusion model built atop Stable Video Diffusion, augmented with a ControlNet-based conditioning path to enforce view-aware inpainting across multiple viewpoints. A mask-aware reconstruction stage then refines the edited Gaussian Splatting by leveraging both inpainted views and training views, reducing artifacts and preserving scene background. Quantitative and qualitative results show superior view-consistency, object quality, and scene harmony compared to SDS-based and single-view-inpainting baselines, indicating meaningful gains for 3D content creation in VR, gaming, and digital media. Limitations include data scarcity for full 360-degree coverage, object removal challenges, and shadows, suggesting avenues for future work.
Abstract
Generating and inserting new objects into 3D content is a compelling approach for achieving versatile scene recreation. Existing methods, which rely on SDS optimization or single-view inpainting, often struggle to produce high-quality results. To address this, we propose a novel method for object insertion in 3D content represented by Gaussian Splatting. Our approach introduces a multi-view diffusion model, dubbed MVInpainter, which is built upon a pre-trained stable video diffusion model to facilitate view-consistent object inpainting. Within MVInpainter, we incorporate a ControlNet-based conditional injection module to enable controlled and more predictable multi-view generation. After generating the multi-view inpainted results, we further propose a mask-aware 3D reconstruction technique to refine Gaussian Splatting reconstruction from these sparse inpainted views. By leveraging these fabricate techniques, our approach yields diverse results, ensures view-consistent and harmonious insertions, and produces better object quality. Extensive experiments demonstrate that our approach outperforms existing methods.
