Table of Contents
Fetching ...

InNeRF360: Text-Guided 3D-Consistent Object Inpainting on 360-degree Neural Radiance Fields

Dongqing Wang, Tong Zhang, Alaa Abboud, Sabine Süsstrunk

TL;DR

Through extensive experiments in segmentation and inpainting on 360° and frontal-facing NeRFs, it is shown that the InNeRF360 approach is effective and enhances NeRF's ed-itability.

Abstract

We propose InNeRF360, an automatic system that accurately removes text-specified objects from 360-degree Neural Radiance Fields (NeRF). The challenge is to effectively remove objects while inpainting perceptually consistent content for the missing regions, which is particularly demanding for existing NeRF models due to their implicit volumetric representation. Moreover, unbounded scenes are more prone to floater artifacts in the inpainted region than frontal-facing scenes, as the change of object appearance and background across views is more sensitive to inaccurate segmentations and inconsistent inpainting. With a trained NeRF and a text description, our method efficiently removes specified objects and inpaints visually consistent content without artifacts. We apply depth-space warping to enforce consistency across multiview text-encoded segmentations, and then refine the inpainted NeRF model using perceptual priors and 3D diffusion-based geometric priors to ensure visual plausibility. Through extensive experiments in segmentation and inpainting on 360-degree and frontal-facing NeRFs, we show that our approach is effective and enhances NeRF's editability. Project page: https://ivrl.github.io/InNeRF360.

InNeRF360: Text-Guided 3D-Consistent Object Inpainting on 360-degree Neural Radiance Fields

TL;DR

Through extensive experiments in segmentation and inpainting on 360° and frontal-facing NeRFs, it is shown that the InNeRF360 approach is effective and enhances NeRF's ed-itability.

Abstract

We propose InNeRF360, an automatic system that accurately removes text-specified objects from 360-degree Neural Radiance Fields (NeRF). The challenge is to effectively remove objects while inpainting perceptually consistent content for the missing regions, which is particularly demanding for existing NeRF models due to their implicit volumetric representation. Moreover, unbounded scenes are more prone to floater artifacts in the inpainted region than frontal-facing scenes, as the change of object appearance and background across views is more sensitive to inaccurate segmentations and inconsistent inpainting. With a trained NeRF and a text description, our method efficiently removes specified objects and inpaints visually consistent content without artifacts. We apply depth-space warping to enforce consistency across multiview text-encoded segmentations, and then refine the inpainted NeRF model using perceptual priors and 3D diffusion-based geometric priors to ensure visual plausibility. Through extensive experiments in segmentation and inpainting on 360-degree and frontal-facing NeRFs, we show that our approach is effective and enhances NeRF's editability. Project page: https://ivrl.github.io/InNeRF360.
Paper Structure (11 sections, 9 equations, 9 figures, 2 tables)

This paper contains 11 sections, 9 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Given a pre-trained NeRF and a text to remove specific objects (e.g."Remove the flowerpot and flowers"), InNeRF360 produces accurate multiview object segmentations, and outputs an inpainted NeRF with visually consistent content.
  • Figure 2: Overview of InNeRF360 framework. 1. Multiview Consistent Segmentation. We initialize masks using bounding boxes from the object detector, which encodes both the source image and text. With rendered depth from the input NeRF, we apply depth-warping prompt refinement to iteratively update points for the Segment Anything Model (SAM) to output view-consistent 2D segmentations. 2. Inpainting 360° NeRF. We obtain edited images through image inpainter with the masks and source images to retrain the inpainted NeRF. We then finetune the new NeRF model using a geometric prior trained from a 3D diffusion model and a masked perceptual prior.
  • Figure 3: Inconsistent bounding box across different views. (a) and (b) are from the same dataset under the same instruction. However, the generated bounding boxes are different. After applying depth warping refinement (point prompts as red dots), (c) generates accurate segmentation.
  • Figure 4: Examples of artifacts in the initialized NeRF. 2D inpaintings contain inconsistent inpainted pixels that accumulate in the 3D inpainted region and appear as floater artifacts.
  • Figure 5: Qualitative comparison on 3D object segmentation. InNeRF360 outputs accurate masks for complex cases containing transparent (vase) or incomplete objects (partial slippers).
  • ...and 4 more figures