ReplaceAnything3D:Text-Guided 3D Scene Editing with Compositional Neural Radiance Fields
Edward Bartrum, Thu Nguyen-Phuoc, Chris Xie, Zhengqin Li, Numair Khan, Armen Avetisyan, Douglas Lanman, Lei Xiao
TL;DR
ReplaceAnything3D (RAM3D) tackles the challenge of editing 3D scenes from text prompts with multi-view consistency. It introduces a two-stage Erase-and-Replace pipeline that first inpaints the background after erasing a target object and then generates a new object conditioned on a replacement prompt, all within a Bubble-NeRF framework to keep computations localized. By distilling 2D diffusion priors through HiFA-inspired losses and integrating a scene-aware LDM inpainting model, RAM3D achieves coherent, photorealistic edits that can also remove or add objects and support personalized content via Dreambooth-style fine-tuning. The approach yields improved visual fidelity and cross-view coherence across forward-facing and 360° scenes, offering a versatile tool for VR/MR, gaming, and film production.
Abstract
We introduce ReplaceAnything3D model (RAM3D), a novel text-guided 3D scene editing method that enables the replacement of specific objects within a scene. Given multi-view images of a scene, a text prompt describing the object to replace, and a text prompt describing the new object, our Erase-and-Replace approach can effectively swap objects in the scene with newly generated content while maintaining 3D consistency across multiple viewpoints. We demonstrate the versatility of ReplaceAnything3D by applying it to various realistic 3D scenes, showcasing results of modified foreground objects that are well-integrated with the rest of the scene without affecting its overall integrity.
