DATENeRF: Depth-Aware Text-based Editing of NeRFs
Sara Rojas, Julien Philip, Kai Zhang, Sai Bi, Fujun Luan, Bernard Ghanem, Kalyan Sunkavall
TL;DR
DATENeRF tackles multiview-consistent text-based editing of NeRF scenes by leveraging depth information as a geometry-guided bridge for 2D diffusion edits. The method uses depth-conditioned ControlNet for coherent edits, followed by a projection-based inpainting scheme that initializes edits with reprojected pixels before full inpainting, and ends with NeRF optimization to fuse changes into the 3D volume. This combination yields higher fidelity, more photorealistic textures, and stronger geometric consistency than prior approaches, while also supporting edge-guided and object-insertion edits. The approach accelerates convergence and broadens editing control, albeit with limitations on large geometric changes and ethical considerations around realistic content manipulation.
Abstract
Recent advancements in diffusion models have shown remarkable proficiency in editing 2D images based on text prompts. However, extending these techniques to edit scenes in Neural Radiance Fields (NeRF) is complex, as editing individual 2D frames can result in inconsistencies across multiple views. Our crucial insight is that a NeRF scene's geometry can serve as a bridge to integrate these 2D edits. Utilizing this geometry, we employ a depth-conditioned ControlNet to enhance the coherence of each 2D image modification. Moreover, we introduce an inpainting approach that leverages the depth information of NeRF scenes to distribute 2D edits across different images, ensuring robustness against errors and resampling challenges. Our results reveal that this methodology achieves more consistent, lifelike, and detailed edits than existing leading methods for text-driven NeRF scene editing.
