SIn-NeRF2NeRF: Editing 3D Scenes with Instructions through Segmentation and Inpainting
Jiseung Hong, Changmin Lee, Gyusang Yu
TL;DR
This work tackles editing 3D scenes represented by Neural Radiance Fields (NeRF) by disentangling a target object from its background and enabling geometric edits on the object. The proposed SIn-NeRF2NeRF (sn2n) splits the scene into an editable RGBA object NeRF (DSNeRF) and a background inpainted NeRF via SPIn-NeRF, then fuses them to allow translations, rotations, and scaling guided by text prompts. Key contributions include a complete pipeline combining 2D multiview segmentation (SAM), RGBA object editing with an Instruct-NeRF2NeRF-inspired diffusion process, and background inpainting with depth-aware 3D fusion, validated with CLIP-based metrics showing competitive edit fidelity. This enables more controllable and precise 3D scene editing for VR/AR applications, offering a robust approach to modify objects while preserving and restoring the surrounding scene.
Abstract
TL;DR Perform 3D object editing selectively by disentangling it from the background scene. Instruct-NeRF2NeRF (in2n) is a promising method that enables editing of 3D scenes composed of Neural Radiance Field (NeRF) using text prompts. However, it is challenging to perform geometrical modifications such as shrinking, scaling, or moving on both the background and object simultaneously. In this project, we enable geometrical changes of objects within the 3D scene by selectively editing the object after separating it from the scene. We perform object segmentation and background inpainting respectively, and demonstrate various examples of freely resizing or moving disentangled objects within the three-dimensional space.
