ViCA-NeRF: View-Consistency-Aware 3D Editing of Neural Radiance Fields
Jiahua Dong, Yu-Xiong Wang
TL;DR
ViCA-NeRF addresses the challenge of multi-view consistent 3D editing of NeRFs under text instructions. It introduces two regularization signals—depth-guided geometry and learned latent alignment in a 2D diffusion model—to propagate edits from edited key views to the full scene. The method operates in two stages: first, editing key views and blending into a coherent dataset; second, refining the NeRF with the updated data, aided by warm-up and post-refinement. Experiments show improved consistency and detail and a speedup over Instruct-NeRF2NeRF, enabling efficient, controllable 3D editing across diverse scenes. The approach provides publicly available code and broad applicability to real-world editing tasks.
Abstract
We introduce ViCA-NeRF, the first view-consistency-aware method for 3D editing with text instructions. In addition to the implicit neural radiance field (NeRF) modeling, our key insight is to exploit two sources of regularization that explicitly propagate the editing information across different views, thus ensuring multi-view consistency. For geometric regularization, we leverage the depth information derived from NeRF to establish image correspondences between different views. For learned regularization, we align the latent codes in the 2D diffusion model between edited and unedited images, enabling us to edit key views and propagate the update throughout the entire scene. Incorporating these two strategies, our ViCA-NeRF operates in two stages. In the initial stage, we blend edits from different views to create a preliminary 3D edit. This is followed by a second stage of NeRF training, dedicated to further refining the scene's appearance. Experimental results demonstrate that ViCA-NeRF provides more flexible, efficient (3 times faster) editing with higher levels of consistency and details, compared with the state of the art. Our code is publicly available.
