3D-Consistent Multi-View Editing by Diffusion Guidance
Josef Bengtson, David Nilsson, Dong In Lee, Fredrik Kahl
TL;DR
This work addresses the problem of geometrically and photometrically inconsistent edits when applying image-editing methods to multi-view 3D scenes. It introduces a training-free diffusion-guidance framework that enforces cross-view consistency via a consistency loss computed from matched points across unedited views, guiding diffusion sampling toward coherent edits. The method supports both dense and sparse view editing and can directly refine 3D Gaussian Splat models, achieving sharp, faithful edits while improving multi-view consistency over existing baselines. Extensive experiments show improved consistency, competitive or superior text-alignment fidelity, and effective sparse-view editing, enabling high-quality 3D-aware edits with practical compute. The approach is demonstrated on Gaussian Splat editing and offers a scalable path to robust 3D content editing using 2D diffusion editors.
Abstract
Recent advancements in diffusion models have greatly improved text-based image editing, yet methods that edit images independently often produce geometrically and photometrically inconsistent results across different views of the same scene. Such inconsistencies are particularly problematic for editing of 3D representations such as NeRFs or Gaussian Splat models. We propose a training-free diffusion framework that enforces multi-view consistency during the image editing process. The key assumption is that corresponding points in the unedited images should undergo similar transformations after editing. To achieve this, we introduce a consistency loss that guides the diffusion sampling toward coherent edits. The framework is flexible and can be combined with widely varying image editing methods, supporting both dense and sparse multi-view editing setups. Experimental results show that our approach significantly improves 3D consistency compared to existing multi-view editing methods. We also show that this increased consistency enables high-quality Gaussian Splat editing with sharp details and strong fidelity to user-specified text prompts. Please refer to our project page for video results: https://3d-consistent-editing.github.io/
