Table of Contents
Fetching ...

SIn-NeRF2NeRF: Editing 3D Scenes with Instructions through Segmentation and Inpainting

Jiseung Hong, Changmin Lee, Gyusang Yu

TL;DR

This work tackles editing 3D scenes represented by Neural Radiance Fields (NeRF) by disentangling a target object from its background and enabling geometric edits on the object. The proposed SIn-NeRF2NeRF (sn2n) splits the scene into an editable RGBA object NeRF (DSNeRF) and a background inpainted NeRF via SPIn-NeRF, then fuses them to allow translations, rotations, and scaling guided by text prompts. Key contributions include a complete pipeline combining 2D multiview segmentation (SAM), RGBA object editing with an Instruct-NeRF2NeRF-inspired diffusion process, and background inpainting with depth-aware 3D fusion, validated with CLIP-based metrics showing competitive edit fidelity. This enables more controllable and precise 3D scene editing for VR/AR applications, offering a robust approach to modify objects while preserving and restoring the surrounding scene.

Abstract

TL;DR Perform 3D object editing selectively by disentangling it from the background scene. Instruct-NeRF2NeRF (in2n) is a promising method that enables editing of 3D scenes composed of Neural Radiance Field (NeRF) using text prompts. However, it is challenging to perform geometrical modifications such as shrinking, scaling, or moving on both the background and object simultaneously. In this project, we enable geometrical changes of objects within the 3D scene by selectively editing the object after separating it from the scene. We perform object segmentation and background inpainting respectively, and demonstrate various examples of freely resizing or moving disentangled objects within the three-dimensional space.

SIn-NeRF2NeRF: Editing 3D Scenes with Instructions through Segmentation and Inpainting

TL;DR

This work tackles editing 3D scenes represented by Neural Radiance Fields (NeRF) by disentangling a target object from its background and enabling geometric edits on the object. The proposed SIn-NeRF2NeRF (sn2n) splits the scene into an editable RGBA object NeRF (DSNeRF) and a background inpainted NeRF via SPIn-NeRF, then fuses them to allow translations, rotations, and scaling guided by text prompts. Key contributions include a complete pipeline combining 2D multiview segmentation (SAM), RGBA object editing with an Instruct-NeRF2NeRF-inspired diffusion process, and background inpainting with depth-aware 3D fusion, validated with CLIP-based metrics showing competitive edit fidelity. This enables more controllable and precise 3D scene editing for VR/AR applications, offering a robust approach to modify objects while preserving and restoring the surrounding scene.

Abstract

TL;DR Perform 3D object editing selectively by disentangling it from the background scene. Instruct-NeRF2NeRF (in2n) is a promising method that enables editing of 3D scenes composed of Neural Radiance Field (NeRF) using text prompts. However, it is challenging to perform geometrical modifications such as shrinking, scaling, or moving on both the background and object simultaneously. In this project, we enable geometrical changes of objects within the 3D scene by selectively editing the object after separating it from the scene. We perform object segmentation and background inpainting respectively, and demonstrate various examples of freely resizing or moving disentangled objects within the three-dimensional space.
Paper Structure (15 sections, 1 equation, 6 figures, 2 tables, 1 algorithm)

This paper contains 15 sections, 1 equation, 6 figures, 2 tables, 1 algorithm.

Figures (6)

  • Figure 1: Overview: We propose the method SIn-NeRF2NeRF, enabling wide range of object edition including translation, rotation, and scale changes.
  • Figure 2: Main Framework: Our method processes an input consisted of original NeRF scene and a text prompt, yielding an edited NeRF scene with the disentangled object. Detailed information regarding the implementation can be found in Section \ref{['sec:implementation']}.
  • Figure 3: Iterative Dataset Update (IDU)
  • Figure 4: Object scene with (left two) and without (right two) the random background color method.
  • Figure 5: Qualitative Results: Object Transformation.
  • ...and 1 more figures