Table of Contents
Fetching ...

Point Resampling and Ray Transformation Aid to Editable NeRF Models

Zhenyang Li, Zilong Chen, Feifan Qu, Mingqing Wang, Yizhou Zhao, Kai Zhang, Yifan Peng

TL;DR

A plug-and-play inpainting module, dubbed differentiable neural-point resampling (DNR), which interpolates regions in 3D space at the original ray locations within the implicit space, thereby facilitating object removal&scene inpainting tasks and achieves state-of-the-art performance.

Abstract

In NeRF-aided editing tasks, object movement presents difficulties in supervision generation due to the introduction of variability in object positions. Moreover, the removal operations of certain scene objects often lead to empty regions, presenting challenges for NeRF models in inpainting them effectively. We propose an implicit ray transformation strategy, allowing for direct manipulation of the 3D object's pose by operating on the neural-point in NeRF rays. To address the challenge of inpainting potential empty regions, we present a plug-and-play inpainting module, dubbed differentiable neural-point resampling (DNR), which interpolates those regions in 3D space at the original ray locations within the implicit space, thereby facilitating object removal & scene inpainting tasks. Importantly, employing DNR effectively narrows the gap between ground truth and predicted implicit features, potentially increasing the mutual information (MI) of the features across rays. Then, we leverage DNR and ray transformation to construct a point-based editable NeRF pipeline PR^2T-NeRF. Results primarily evaluated on 3D object removal & inpainting tasks indicate that our pipeline achieves state-of-the-art performance. In addition, our pipeline supports high-quality rendering visualization for diverse editing operations without necessitating extra supervision.

Point Resampling and Ray Transformation Aid to Editable NeRF Models

TL;DR

A plug-and-play inpainting module, dubbed differentiable neural-point resampling (DNR), which interpolates regions in 3D space at the original ray locations within the implicit space, thereby facilitating object removal&scene inpainting tasks and achieves state-of-the-art performance.

Abstract

In NeRF-aided editing tasks, object movement presents difficulties in supervision generation due to the introduction of variability in object positions. Moreover, the removal operations of certain scene objects often lead to empty regions, presenting challenges for NeRF models in inpainting them effectively. We propose an implicit ray transformation strategy, allowing for direct manipulation of the 3D object's pose by operating on the neural-point in NeRF rays. To address the challenge of inpainting potential empty regions, we present a plug-and-play inpainting module, dubbed differentiable neural-point resampling (DNR), which interpolates those regions in 3D space at the original ray locations within the implicit space, thereby facilitating object removal & scene inpainting tasks. Importantly, employing DNR effectively narrows the gap between ground truth and predicted implicit features, potentially increasing the mutual information (MI) of the features across rays. Then, we leverage DNR and ray transformation to construct a point-based editable NeRF pipeline PR^2T-NeRF. Results primarily evaluated on 3D object removal & inpainting tasks indicate that our pipeline achieves state-of-the-art performance. In addition, our pipeline supports high-quality rendering visualization for diverse editing operations without necessitating extra supervision.
Paper Structure (23 sections, 11 equations, 7 figures, 3 tables)

This paper contains 23 sections, 11 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: P$\text{R}^2$T-NeRF supports robust manipulations for novel view synthesis and scene editing, in particular object removal & inpainting. Left: Workflow and progressive results of our pipeline in object editing of a natural scene; Right: Distribution of color (features) and density in the volume grid of vanilla NeRF, where the example scene li2021neural is decomposed into quasi-continuous planes alone depth while object features spread out with varying degrees on different planes. We observe that the foreground only shows a kid within shallow depth (planes in black), while the far-depth background gradually appears in the planes behind.
  • Figure 2: Overview of our editable rendering pipeline and transformations. Left: The object removal & inpainting framework integrates the SAM model to generate a 2D mask for the target object. Subsequently, this 2D mask is unprojected onto the 3D space, effectively creating a point cloud mask, while features are extracted from the original image to serve as point cloud features and fine-tuning to derive the neural point cloud of the unedited scene. Next, the DNR module is utilized to mend the features of empty regions in masked 3D points. Finally, we supervise the rendering views by generating inpainted images from LaMa; Right: Schematic visualization of rigid and non-rigid transformations.
  • Figure 3: Overview of DNR strategies in \ref{['subsec:diff-ray']} c). The three feature resampling schemes (NI, KWA, and GWFA) are illustrated, with different sum weights defined.
  • Figure 4: Qualitative comparison of P$\text{R}^2$T-NeRF with counterparts. A comparative analysis of inpainting results is conducted across three scenes using the SPIn-NeRF dataset mirzaei2023spin. The color frames in the "Inpainting GT" column indicate the locations of the target object to be removed. Columns (From $4^{\text{th}}$ to $6^{\text{th}}$): the novel view of the scene generated by Ours, SPIn-NeRF, OR-NeRF, and NeRF-In. It should be noted that the recovery of shadows and periodic textures proves challenging both for baselines and our model, nevertheless, our model demonstrates superior performance in alleviating the shadows and twisted artifacts in textures in the rendering results.
  • Figure 5: Qualitative results on the IBRNet dataset. In this example, we seek to inpaint the obstructed non-target object (highlighted box region). Herein, 'MoV' refers to mono-view, while 'MV' represents multi-view.
  • ...and 2 more figures