CoreEditor: Consistent 3D Editing via Correspondence-constrained Diffusion
Zhe Zhu, Honghua Chen, Peng Li, Mingqiang Wei
TL;DR
CoreEditor addresses the problem of inconsistent multi-view edits in text-driven 3D editing by introducing a correspondence-constrained attention mechanism that enforces precise cross-view interactions during diffusion. It combines a geometric plus semantic co-supported correspondence strategy with a Reference Attention pipeline to align global editing styles and maintain local consistency without fine-tuning the diffusion model. The key contributions are the Correspondence-constrained Attention (CCA), the geometric+semantic correspondence framework, and the selective editing pipeline enabling user-controlled, diverse, yet faithful edits. Empirical results across seven scenes demonstrate superior 3D consistency, sharper textures, and higher semantic fidelity compared to state-of-the-art baselines, while maintaining efficiency and zero-shot deployment. This work advances practical, high-quality 3D editing workflows for neural scene representations such as Gaussian Splatting.
Abstract
Text-driven 3D editing seeks to modify 3D scenes according to textual descriptions, and most existing approaches tackle this by adapting pre-trained 2D image editors to multi-view inputs. However, without explicit control over multi-view information exchange, they often fail to maintain cross-view consistency, leading to insufficient edits and blurry details. We introduce CoreEditor, a novel framework for consistent text-to-3D editing. The key innovation is a correspondence-constrained attention mechanism that enforces precise interactions between pixels expected to remain consistent throughout the diffusion denoising process. Beyond relying solely on geometric alignment, we further incorporate semantic similarity estimated during denoising, enabling more reliable correspondence modeling and robust multi-view editing. In addition, we design a selective editing pipeline that allows users to choose preferred results from multiple candidates, offering greater flexibility and user control. Extensive experiments show that CoreEditor produces high-quality, 3D-consistent edits with sharper details, significantly outperforming prior methods.
