Table of Contents
Fetching ...

TrAME: Trajectory-Anchored Multi-View Editing for Text-Guided 3D Gaussian Splatting Manipulation

Chaofan Luo, Donglin Di, Xun Yang, Yongjia Ma, Zhou Xue, Chen Wei, Yebin Liu

TL;DR

This work tackles the challenge of preserving multi-view consistency in text-guided 3D editing. It introduces Trajectory-Anchored Multi-View Editing (TrAME), which couples 2D view edits with 3D updates via the Trajectory-Anchored Scheme (TAS) and enforces cross-view coherence with the View-Consistent Attention Control (VCAC) module. A theoretical bridge is drawn between optimization-based SDS and reconstruction-based DDIM/DDCM approaches, offering a unified perspective for design choices. Empirical results show improved editing quality and view consistency over state-of-the-art methods, with extensive ablations validating the contributions. The approach enables more reliable, progressively updated 3D scene edits and the code will be released publicly.

Abstract

Despite significant strides in the field of 3D scene editing, current methods encounter substantial challenge, particularly in preserving 3D consistency in multi-view editing process. To tackle this challenge, we propose a progressive 3D editing strategy that ensures multi-view consistency via a Trajectory-Anchored Scheme (TAS) with a dual-branch editing mechanism. Specifically, TAS facilitates a tightly coupled iterative process between 2D view editing and 3D updating, preventing error accumulation yielded from text-to-image process. Additionally, we explore the relationship between optimization-based methods and reconstruction-based methods, offering a unified perspective for selecting superior design choice, supporting the rationale behind the designed TAS. We further present a tuning-free View-Consistent Attention Control (VCAC) module that leverages cross-view semantic and geometric reference from the source branch to yield aligned views from the target branch during the editing of 2D views. To validate the effectiveness of our method, we analyze 2D examples to demonstrate the improved consistency with the VCAC module. Further extensive quantitative and qualitative results in text-guided 3D scene editing indicate that our method achieves superior editing quality compared to state-of-the-art methods. We will make the complete codebase publicly available following the conclusion of the review process.

TrAME: Trajectory-Anchored Multi-View Editing for Text-Guided 3D Gaussian Splatting Manipulation

TL;DR

This work tackles the challenge of preserving multi-view consistency in text-guided 3D editing. It introduces Trajectory-Anchored Multi-View Editing (TrAME), which couples 2D view edits with 3D updates via the Trajectory-Anchored Scheme (TAS) and enforces cross-view coherence with the View-Consistent Attention Control (VCAC) module. A theoretical bridge is drawn between optimization-based SDS and reconstruction-based DDIM/DDCM approaches, offering a unified perspective for design choices. Empirical results show improved editing quality and view consistency over state-of-the-art methods, with extensive ablations validating the contributions. The approach enables more reliable, progressively updated 3D scene edits and the code will be released publicly.

Abstract

Despite significant strides in the field of 3D scene editing, current methods encounter substantial challenge, particularly in preserving 3D consistency in multi-view editing process. To tackle this challenge, we propose a progressive 3D editing strategy that ensures multi-view consistency via a Trajectory-Anchored Scheme (TAS) with a dual-branch editing mechanism. Specifically, TAS facilitates a tightly coupled iterative process between 2D view editing and 3D updating, preventing error accumulation yielded from text-to-image process. Additionally, we explore the relationship between optimization-based methods and reconstruction-based methods, offering a unified perspective for selecting superior design choice, supporting the rationale behind the designed TAS. We further present a tuning-free View-Consistent Attention Control (VCAC) module that leverages cross-view semantic and geometric reference from the source branch to yield aligned views from the target branch during the editing of 2D views. To validate the effectiveness of our method, we analyze 2D examples to demonstrate the improved consistency with the VCAC module. Further extensive quantitative and qualitative results in text-guided 3D scene editing indicate that our method achieves superior editing quality compared to state-of-the-art methods. We will make the complete codebase publicly available following the conclusion of the review process.
Paper Structure (17 sections, 15 equations, 9 figures, 4 tables, 1 algorithm)

This paper contains 17 sections, 15 equations, 9 figures, 4 tables, 1 algorithm.

Figures (9)

  • Figure 1: Illustration of the proposed method, Trajectory-Anchored Multi-View Editing for 3D Gaussian Splatting Manipulation (TrAME). Our method comprises a Trajectory Anchored Scheme (TAS) as well as a View-Consistent Attention Control (VCAC) module. Given a source prompt, a target prompt and the original 3DGS $\theta^{(0)}$ as input, the VCAC module can yield 3D-consistent and progressively edited views with a single-step inference to update 3DGS. Conversely, the views rendered from the updated 3DGS correct minor inconsistencies from previous view edits and serve as inputs for subsequent steps, thereby preventing error accumulation from the 2D editing process. This process alternatively update the 2D views and 3DGS in a synchronized and progressive manner, producing the final edited 3DGS $\theta^{(T)}$.
  • Figure 2: The editing trajectory exhibits a smooth and incremental transition from the original image to the final edited image.
  • Figure 3: Qualitative comparison. A comparative analysis of experimental results for single-scene editing across State-of-the-Art methods and ours. Please zoom in for more geometry and textural details.
  • Figure 4: Qualitative comparison. A comparative analysis of different design choices of DDCM coefficient $\kappa$ on 2D view editing.
  • Figure 5: Qualitative results under different hyper-parameter settings. Rendered RGB and depth representations illustrating the implications of different DDCM coefficient $\kappa$ on 3DGS editing.
  • ...and 4 more figures