Table of Contents
Fetching ...

DragGaussian: Enabling Drag-style Manipulation on 3D Gaussian Representation

Sitian Shen, Jing Xu, Yuheng Yuan, Xingyi Yang, Qiuhong Shen, Xinchao Wang

TL;DR

DragGaussian tackles the challenge of editing 3D content with limited 3D data by combining 3D Gaussian Splatting with diffusion-based 2D editing to achieve multi-view-consistent edits. It introduces an interactive, drag-based 3D editing pipeline that uses a projection module, LoRA-fine-tuned multi-view diffusion, and subsequent 3D Gaussian refinement to produce edited appearances. The work demonstrates the feasibility of open-vocabulary, point-based 3D edits and analyzes the impact of identity-preserving fine-tuning, motion supervision, and tracking on cross-view consistency. Limitations include non-real-time performance and diffusion-model dependencies, with future work aimed at real-time 3D-native editing advances.

Abstract

User-friendly 3D object editing is a challenging task that has attracted significant attention recently. The limitations of direct 3D object editing without 2D prior knowledge have prompted increased attention towards utilizing 2D generative models for 3D editing. While existing methods like Instruct NeRF-to-NeRF offer a solution, they often lack user-friendliness, particularly due to semantic guided editing. In the realm of 3D representation, 3D Gaussian Splatting emerges as a promising approach for its efficiency and natural explicit property, facilitating precise editing tasks. Building upon these insights, we propose DragGaussian, a 3D object drag-editing framework based on 3D Gaussian Splatting, leveraging diffusion models for interactive image editing with open-vocabulary input. This framework enables users to perform drag-based editing on pre-trained 3D Gaussian object models, producing modified 2D images through multi-view consistent editing. Our contributions include the introduction of a new task, the development of DragGaussian for interactive point-based 3D editing, and comprehensive validation of its effectiveness through qualitative and quantitative experiments.

DragGaussian: Enabling Drag-style Manipulation on 3D Gaussian Representation

TL;DR

DragGaussian tackles the challenge of editing 3D content with limited 3D data by combining 3D Gaussian Splatting with diffusion-based 2D editing to achieve multi-view-consistent edits. It introduces an interactive, drag-based 3D editing pipeline that uses a projection module, LoRA-fine-tuned multi-view diffusion, and subsequent 3D Gaussian refinement to produce edited appearances. The work demonstrates the feasibility of open-vocabulary, point-based 3D edits and analyzes the impact of identity-preserving fine-tuning, motion supervision, and tracking on cross-view consistency. Limitations include non-real-time performance and diffusion-model dependencies, with future work aimed at real-time 3D-native editing advances.

Abstract

User-friendly 3D object editing is a challenging task that has attracted significant attention recently. The limitations of direct 3D object editing without 2D prior knowledge have prompted increased attention towards utilizing 2D generative models for 3D editing. While existing methods like Instruct NeRF-to-NeRF offer a solution, they often lack user-friendliness, particularly due to semantic guided editing. In the realm of 3D representation, 3D Gaussian Splatting emerges as a promising approach for its efficiency and natural explicit property, facilitating precise editing tasks. Building upon these insights, we propose DragGaussian, a 3D object drag-editing framework based on 3D Gaussian Splatting, leveraging diffusion models for interactive image editing with open-vocabulary input. This framework enables users to perform drag-based editing on pre-trained 3D Gaussian object models, producing modified 2D images through multi-view consistent editing. Our contributions include the introduction of a new task, the development of DragGaussian for interactive point-based 3D editing, and comprehensive validation of its effectiveness through qualitative and quantitative experiments.
Paper Structure (22 sections, 11 equations, 8 figures)

This paper contains 22 sections, 11 equations, 8 figures.

Figures (8)

  • Figure 1: Pipeline of DragGaussian.
  • Figure 2: Overview of our UI.
  • Figure 3: Drawing mask using the brush.
  • Figure 4: Stages for Multi-view Consistent Editing.
  • Figure 5: Multi-view consistent editing on a chair.
  • ...and 3 more figures