C3Editor: Achieving Controllable Consistency in 2D Model for 3D Editing
Zeng Tao, Zheng Ding, Zeyuan Chen, Xiang Zhang, Leizhi Li, Zhuowen Tu
TL;DR
C3Editor tackles view-inconsistency in 2D priors used for 3D editing by selecting a ground-truth (GT) view and GT-edited image to steer a view-consistent 2D editing model. It introduces a two-phase, GT-guided optimization with intra-GT prior fitting (via LoRA_gt) and progressive view propagation for inter-view consistency (via LoRA_mv), enabling controllable edits across all views. The method updates the 3D representation (Gaussian Splatting) using edited 2D views, achieving higher CLIP-Scores for text- and image-driven alignment and lower FID than baselines. Overall, C3Editor delivers more coherent 2D-to-3D edits with user-controllable directions, showing practical improvements for multi-view 3D editing tasks, though it requires scene-specific 2D models and leaves room for generalization to fully generic multi-view editing.
Abstract
Existing 2D-lifting-based 3D editing methods often encounter challenges related to inconsistency, stemming from the lack of view-consistent 2D editing models and the difficulty of ensuring consistent editing across multiple views. To address these issues, we propose C3Editor, a controllable and consistent 2D-lifting-based 3D editing framework. Given an original 3D representation and a text-based editing prompt, our method selectively establishes a view-consistent 2D editing model to achieve superior 3D editing results. The process begins with the controlled selection of a ground truth (GT) view and its corresponding edited image as the optimization target, allowing for user-defined manual edits. Next, we fine-tune the 2D editing model within the GT view and across multiple views to align with the GT-edited image while ensuring multi-view consistency. To meet the distinct requirements of GT view fitting and multi-view consistency, we introduce separate LoRA modules for targeted fine-tuning. Our approach delivers more consistent and controllable 2D and 3D editing results than existing 2D-lifting-based methods, outperforming them in both qualitative and quantitative evaluations.
