Table of Contents
Fetching ...

View-Consistent 3D Editing with Gaussian Splatting

Yuxuan Wang, Xuanyu Yi, Zike Wu, Na Zhao, Long Chen, Hanwang Zhang

TL;DR

VcEdit tackles multi-view inconsistency in image-guided 3DGS editing by integrating two consistency modules into an iterative editing pipeline. The Cross-attention Consistency Module aggregates cross-view attention in the diffusion backbone via inverse-rendering to a unified 3D map and re-renders it to 2D, while the Editing Consistency Module calibrates edited guidance through a fast 3DGS-based refinement and local blending. An iterative pattern further refines the 3DGS and guidance across cycles, yielding coherent edits across diverse scenes. Empirical results on qualitative and quantitative metrics, including a CLIP-based directional similarity and a human user study, show that VcEdit surpasses baselines such as DDS and GSEditor in view-consistent, high-fidelity 3D edits with reduced mode collapse.

Abstract

The advent of 3D Gaussian Splatting (3DGS) has revolutionized 3D editing, offering efficient, high-fidelity rendering and enabling precise local manipulations. Currently, diffusion-based 2D editing models are harnessed to modify multi-view rendered images, which then guide the editing of 3DGS models. However, this approach faces a critical issue of multi-view inconsistency, where the guidance images exhibit significant discrepancies across views, leading to mode collapse and visual artifacts of 3DGS. To this end, we introduce View-consistent Editing (VcEdit), a novel framework that seamlessly incorporates 3DGS into image editing processes, ensuring multi-view consistency in edited guidance images and effectively mitigating mode collapse issues. VcEdit employs two innovative consistency modules: the Cross-attention Consistency Module and the Editing Consistency Module, both designed to reduce inconsistencies in edited images. By incorporating these consistency modules into an iterative pattern, VcEdit proficiently resolves the issue of multi-view inconsistency, facilitating high-quality 3DGS editing across a diverse range of scenes. Further video results are shown in http://vcedit.github.io.

View-Consistent 3D Editing with Gaussian Splatting

TL;DR

VcEdit tackles multi-view inconsistency in image-guided 3DGS editing by integrating two consistency modules into an iterative editing pipeline. The Cross-attention Consistency Module aggregates cross-view attention in the diffusion backbone via inverse-rendering to a unified 3D map and re-renders it to 2D, while the Editing Consistency Module calibrates edited guidance through a fast 3DGS-based refinement and local blending. An iterative pattern further refines the 3DGS and guidance across cycles, yielding coherent edits across diverse scenes. Empirical results on qualitative and quantitative metrics, including a CLIP-based directional similarity and a human user study, show that VcEdit surpasses baselines such as DDS and GSEditor in view-consistent, high-fidelity 3D edits with reduced mode collapse.

Abstract

The advent of 3D Gaussian Splatting (3DGS) has revolutionized 3D editing, offering efficient, high-fidelity rendering and enabling precise local manipulations. Currently, diffusion-based 2D editing models are harnessed to modify multi-view rendered images, which then guide the editing of 3DGS models. However, this approach faces a critical issue of multi-view inconsistency, where the guidance images exhibit significant discrepancies across views, leading to mode collapse and visual artifacts of 3DGS. To this end, we introduce View-consistent Editing (VcEdit), a novel framework that seamlessly incorporates 3DGS into image editing processes, ensuring multi-view consistency in edited guidance images and effectively mitigating mode collapse issues. VcEdit employs two innovative consistency modules: the Cross-attention Consistency Module and the Editing Consistency Module, both designed to reduce inconsistencies in edited images. By incorporating these consistency modules into an iterative pattern, VcEdit proficiently resolves the issue of multi-view inconsistency, facilitating high-quality 3DGS editing across a diverse range of scenes. Further video results are shown in http://vcedit.github.io.
Paper Structure (30 sections, 9 equations, 12 figures, 1 table, 1 algorithm)

This paper contains 30 sections, 9 equations, 12 figures, 1 table, 1 algorithm.

Figures (12)

  • Figure 1: Capability highlight of our method: VcEdit. Given a source 3D Gaussian Splatting and a user-specified text prompt, our VcEdit enables versatile scene and object editing. By ensuring multi-view consistent image guidance, VcEdit alleviates artifacts and excels in high-quality editing.
  • Figure 2: (a): Current image-guided 3DGS editing pipeline and its multi-view inconsistency issue: The rendered views of a man are separately edited to different styles of clowns, leading to the mode collapse issue of learned 3DGS. (b): The iterative pattern and our consistency modules deployed in each iteration. The 3DGS is progressively guided to a coherent style that aligns with the "clown" through the iterative pattern.
  • Figure 3: The pipeline of our VcEdit: VcEdit employs an image-guided editing pipeline. In the image editing stage, the Cross-attention Consistency Module and Editing Consistency Module are employed to ensure the multi-view consistency of edited images. We provide a detailed overview in Sec. \ref{['sec:imgedit']}.
  • Figure 4: Qualitative comparison with the DDS hertz2023delta and the GSEditor chen2023gaussianeditor: The topmost row demonstrate the original views, while the bottom rows show the rendering view of edited 3DGS. VcEdit excels by effectively addressing the multi-view inconsistency, resulting in superior editing quality. In contrast, other methods encounter challenges with mode collapse and exhibit flickering artifacts.
  • Figure 5: Extensive Results of VcEdit: Our method is capable of various editing tasks, including face, object, and large-scale scene editing. The leftmost column demonstrates the original view, while the right four columns show the rendered view of edited 3DGS.
  • ...and 7 more figures