Table of Contents
Fetching ...

Dyn-E: Local Appearance Editing of Dynamic Neural Radiance Fields

Shangzan Zhang, Sida Peng, Yinji ShenTu, Qing Shuai, Tianrun Chen, Kaicheng Yu, Hujun Bao, Xiaowei Zhou

TL;DR

This work tackles local appearance editing of dynamic NeRFs by introducing Dyn-E, a plug-in local surface that lifts a user-edited 2D region into 3D and converts it into a local density/color field that can render with the original dynamic NeRF. An invertible motion representation network is trained to warp this local surface across frames, enabling temporally consistent edits without overfitting the entire dynamic scene. The approach is compatible with multiple dynamic NeRF variants and is regularized using Laplacian smoothing and photometric alignment, with a distillation-based training of scene flow to balance 3D supervision and memory efficiency. Experiments on Nvidia Dynamic Scenes, Nerfies-HyperNeRF-CoNeRF, and ZJU-MoCap demonstrate improved temporal consistency and photo-realism over strong baselines, underscoring the method’s potential for practical 3D-aware, frame-accurate appearance editing of dynamic scenes.

Abstract

Recently, the editing of neural radiance fields (NeRFs) has gained considerable attention, but most prior works focus on static scenes while research on the appearance editing of dynamic scenes is relatively lacking. In this paper, we propose a novel framework to edit the local appearance of dynamic NeRFs by manipulating pixels in a single frame of training video. Specifically, to locally edit the appearance of dynamic NeRFs while preserving unedited regions, we introduce a local surface representation of the edited region, which can be inserted into and rendered along with the original NeRF and warped to arbitrary other frames through a learned invertible motion representation network. By employing our method, users without professional expertise can easily add desired content to the appearance of a dynamic scene. We extensively evaluate our approach on various scenes and show that our approach achieves spatially and temporally consistent editing results. Notably, our approach is versatile and applicable to different variants of dynamic NeRF representations.

Dyn-E: Local Appearance Editing of Dynamic Neural Radiance Fields

TL;DR

This work tackles local appearance editing of dynamic NeRFs by introducing Dyn-E, a plug-in local surface that lifts a user-edited 2D region into 3D and converts it into a local density/color field that can render with the original dynamic NeRF. An invertible motion representation network is trained to warp this local surface across frames, enabling temporally consistent edits without overfitting the entire dynamic scene. The approach is compatible with multiple dynamic NeRF variants and is regularized using Laplacian smoothing and photometric alignment, with a distillation-based training of scene flow to balance 3D supervision and memory efficiency. Experiments on Nvidia Dynamic Scenes, Nerfies-HyperNeRF-CoNeRF, and ZJU-MoCap demonstrate improved temporal consistency and photo-realism over strong baselines, underscoring the method’s potential for practical 3D-aware, frame-accurate appearance editing of dynamic scenes.

Abstract

Recently, the editing of neural radiance fields (NeRFs) has gained considerable attention, but most prior works focus on static scenes while research on the appearance editing of dynamic scenes is relatively lacking. In this paper, we propose a novel framework to edit the local appearance of dynamic NeRFs by manipulating pixels in a single frame of training video. Specifically, to locally edit the appearance of dynamic NeRFs while preserving unedited regions, we introduce a local surface representation of the edited region, which can be inserted into and rendered along with the original NeRF and warped to arbitrary other frames through a learned invertible motion representation network. By employing our method, users without professional expertise can easily add desired content to the appearance of a dynamic scene. We extensively evaluate our approach on various scenes and show that our approach achieves spatially and temporally consistent editing results. Notably, our approach is versatile and applicable to different variants of dynamic NeRF representations.
Paper Structure (25 sections, 11 equations, 5 figures, 2 tables)

This paper contains 25 sections, 11 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Illustration of our pipeline. Given a single edited image and a dynamic NeRF, we first lift the edited region to the 3D space through rendered depth maps to form a textured mesh. Then, we train an invertible network to propagate the textured mesh to other frames. Finally, we combine the textured mesh with the original dynamic NeRF and render them to obtain the final results.
  • Figure 2: Qualitative comparisons. We generate more realistic results than the baseline methods.
  • Figure 3: Importance of handling occlusion relationship. We show the results of our method with and without handling the occlusion relationship. The baseline "Ours w/o Occ" fails to correctly handle the occlusion, resulting in the edited content behind the human body being visible.
  • Figure 4: Qualitative results on the Nerfies-HyperNeRF-CoNeRF dataset. More results are shown in the supplementary video.
  • Figure 5: Qualitative results on the ZJU-MoCap dataset. More results are shown in the supplementary video.