Table of Contents
Fetching ...

View-consistent Object Removal in Radiance Fields

Yiren Lu, Jing Ma, Yu Yin

TL;DR

This work tackles cross-view inconsistency in editing radiance fields by proposing a single-reference inpainting pipeline that propagates edits to all training views through depth-based projection. It augments realism under varying lighting with directional appearance variants and enforces consistency with depth-aware occlusion handling, while yielding a fast, robust multi-view segmentation as a byproduct. The method is demonstrated on NeRF and 3D-Gaussian Splatting backends, achieving superior cross-view coherence and image quality compared to strong baselines, as evidenced by quantitative metrics including PSNR, LPIPS, and FID. Training relies on a reconstruction loss over projected views, and the approach promises practical gains for RF editing in VR/AR and related applications.

Abstract

Radiance Fields (RFs) have emerged as a crucial technology for 3D scene representation, enabling the synthesis of novel views with remarkable realism. However, as RFs become more widely used, the need for effective editing techniques that maintain coherence across different perspectives becomes evident. Current methods primarily depend on per-frame 2D image inpainting, which often fails to maintain consistency across views, thus compromising the realism of edited RF scenes. In this work, we introduce a novel RF editing pipeline that significantly enhances consistency by requiring the inpainting of only a single reference image. This image is then projected across multiple views using a depth-based approach, effectively reducing the inconsistencies observed with per-frame inpainting. However, projections typically assume photometric consistency across views, which is often impractical in real-world settings. To accommodate realistic variations in lighting and viewpoint, our pipeline adjusts the appearance of the projected views by generating multiple directional variants of the inpainted image, thereby adapting to different photometric conditions. Additionally, we present an effective and robust multi-view object segmentation approach as a valuable byproduct of our pipeline. Extensive experiments demonstrate that our method significantly surpasses existing frameworks in maintaining content consistency across views and enhancing visual quality. More results are available at https://vulab-ai.github.io/View-consistent_Object_Removal_in_Radiance_Fields.

View-consistent Object Removal in Radiance Fields

TL;DR

This work tackles cross-view inconsistency in editing radiance fields by proposing a single-reference inpainting pipeline that propagates edits to all training views through depth-based projection. It augments realism under varying lighting with directional appearance variants and enforces consistency with depth-aware occlusion handling, while yielding a fast, robust multi-view segmentation as a byproduct. The method is demonstrated on NeRF and 3D-Gaussian Splatting backends, achieving superior cross-view coherence and image quality compared to strong baselines, as evidenced by quantitative metrics including PSNR, LPIPS, and FID. Training relies on a reconstruction loss over projected views, and the approach promises practical gains for RF editing in VR/AR and related applications.

Abstract

Radiance Fields (RFs) have emerged as a crucial technology for 3D scene representation, enabling the synthesis of novel views with remarkable realism. However, as RFs become more widely used, the need for effective editing techniques that maintain coherence across different perspectives becomes evident. Current methods primarily depend on per-frame 2D image inpainting, which often fails to maintain consistency across views, thus compromising the realism of edited RF scenes. In this work, we introduce a novel RF editing pipeline that significantly enhances consistency by requiring the inpainting of only a single reference image. This image is then projected across multiple views using a depth-based approach, effectively reducing the inconsistencies observed with per-frame inpainting. However, projections typically assume photometric consistency across views, which is often impractical in real-world settings. To accommodate realistic variations in lighting and viewpoint, our pipeline adjusts the appearance of the projected views by generating multiple directional variants of the inpainted image, thereby adapting to different photometric conditions. Additionally, we present an effective and robust multi-view object segmentation approach as a valuable byproduct of our pipeline. Extensive experiments demonstrate that our method significantly surpasses existing frameworks in maintaining content consistency across views and enhancing visual quality. More results are available at https://vulab-ai.github.io/View-consistent_Object_Removal_in_Radiance_Fields.
Paper Structure (14 sections, 9 equations, 8 figures, 4 tables, 1 algorithm)

This paper contains 14 sections, 9 equations, 8 figures, 4 tables, 1 algorithm.

Figures (8)

  • Figure 1: An illustration of our radiance field (RF) inpainting pipeline. Unlike conventional methods that inpaint on a per-frame basis, our approach inpaints a single reference image and applies depth-based projection to seamlessly extend the modifications across multiple views. We show that our method not only enhances the quality of inpainted RF scenes but also significantly improves correspondence between different perspectives.
  • Figure 2: An overview of our method: we initiate our methodology by selecting a reference camera pose from the training dataset; this camera pose is identified as having the minimal average distance to all other poses on the SE(3) manifold. The processing of the chosen reference view involves three key steps: masking, inpainting, and depth estimation, yielding three outputs: the mask $M_r$, the inpainted image $I_r$, and the depth map $D_r$, respectively. These outputs are then used for multi-view projection, yielding a set of inpainted images from multiple views. Finally, an inpainted Radiance Field will be trained using these inpainted images.
  • Figure 3: Qualitative comparison between our methods and baseline methods. For each scene, we show images from two different views to compare both rendering quality and cross-view consistency.
  • Figure 4: Visualization of feature matching results within the masked region. Ground Truth, SPIn-NeRF, NeRFiller, and Ours-NeRF have number of matchings 329, 193 , 84 and 324 respectively. The original scene picture is shown in Fig. \ref{['fig:qualitative']}
  • Figure 5: Failure case of OR-NeRF (point prompt) is on the upper left corner. The first row shows the manually annotated point prompts in a selected view and its mask generated by SAM. The second row shows the propagated point prompts to another view and its corresponding mask. We can see that one of the propagated point prompt does not lay on the expected region and thus the generated mask is not completed.
  • ...and 3 more figures