Table of Contents
Fetching ...

Dragen3D: Multiview Geometry Consistent 3D Gaussian Generation with Drag-Based Control

Jinbo Yan, Alan Zhao, Yixin Hu

TL;DR

Dragen3D tackles the problem of single-image 3D Gaussian generation with multi-view geometric consistency and intuitive editing. It introduces an Anchor-Gaussian VAE to encode geometry and texture into anchor latents and a Seed-Point-Driven pipeline that maps sparse seed points to these latents via a Seed-Anchor Mapping module, enabling drag-based deformation without relying on 2D diffusion priors. The method achieves state-of-the-art 3DGS quality on Objaverse and Google Scanned Objects while offering interactive editing through seed-point manipulation, and it supports efficient latent-space generation through a coarse-to-fine decoding scheme. This framework promises practical impact for artists and applications in VR, 3D modeling, and content creation by combining geometric fidelity, multi-view consistency, and user-friendly control.

Abstract

Single-image 3D generation has emerged as a prominent research topic, playing a vital role in virtual reality, 3D modeling, and digital content creation. However, existing methods face challenges such as a lack of multi-view geometric consistency and limited controllability during the generation process, which significantly restrict their usability. % To tackle these challenges, we introduce Dragen3D, a novel approach that achieves geometrically consistent and controllable 3D generation leveraging 3D Gaussian Splatting (3DGS). We introduce the Anchor-Gaussian Variational Autoencoder (Anchor-GS VAE), which encodes a point cloud and a single image into anchor latents and decode these latents into 3DGS, enabling efficient latent-space generation. To enable multi-view geometry consistent and controllable generation, we propose a Seed-Point-Driven strategy: first generate sparse seed points as a coarse geometry representation, then map them to anchor latents via the Seed-Anchor Mapping Module. Geometric consistency is ensured by the easily learned sparse seed points, and users can intuitively drag the seed points to deform the final 3DGS geometry, with changes propagated through the anchor latents. To the best of our knowledge, we are the first to achieve geometrically controllable 3D Gaussian generation and editing without relying on 2D diffusion priors, delivering comparable 3D generation quality to state-of-the-art methods.

Dragen3D: Multiview Geometry Consistent 3D Gaussian Generation with Drag-Based Control

TL;DR

Dragen3D tackles the problem of single-image 3D Gaussian generation with multi-view geometric consistency and intuitive editing. It introduces an Anchor-Gaussian VAE to encode geometry and texture into anchor latents and a Seed-Point-Driven pipeline that maps sparse seed points to these latents via a Seed-Anchor Mapping module, enabling drag-based deformation without relying on 2D diffusion priors. The method achieves state-of-the-art 3DGS quality on Objaverse and Google Scanned Objects while offering interactive editing through seed-point manipulation, and it supports efficient latent-space generation through a coarse-to-fine decoding scheme. This framework promises practical impact for artists and applications in VR, 3D modeling, and content creation by combining geometric fidelity, multi-view consistency, and user-friendly control.

Abstract

Single-image 3D generation has emerged as a prominent research topic, playing a vital role in virtual reality, 3D modeling, and digital content creation. However, existing methods face challenges such as a lack of multi-view geometric consistency and limited controllability during the generation process, which significantly restrict their usability. % To tackle these challenges, we introduce Dragen3D, a novel approach that achieves geometrically consistent and controllable 3D generation leveraging 3D Gaussian Splatting (3DGS). We introduce the Anchor-Gaussian Variational Autoencoder (Anchor-GS VAE), which encodes a point cloud and a single image into anchor latents and decode these latents into 3DGS, enabling efficient latent-space generation. To enable multi-view geometry consistent and controllable generation, we propose a Seed-Point-Driven strategy: first generate sparse seed points as a coarse geometry representation, then map them to anchor latents via the Seed-Anchor Mapping Module. Geometric consistency is ensured by the easily learned sparse seed points, and users can intuitively drag the seed points to deform the final 3DGS geometry, with changes propagated through the anchor latents. To the best of our knowledge, we are the first to achieve geometrically controllable 3D Gaussian generation and editing without relying on 2D diffusion priors, delivering comparable 3D generation quality to state-of-the-art methods.

Paper Structure

This paper contains 37 sections, 14 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Overview of the framework.
  • Figure 2: Seed-Anchor Mapping Module: (a) We use FPS to establish a correspondence between $Z$ and $\mathbf{X}_\mathcal{S}$. (b) Dimension Alignment: Encoding the seed points $\mathbf{X}_\mathcal{S}$ to obtain $Z_\mathcal{S}$, ensuring dimensional alignment with $Z$. (c) Token Alignment: Each token in the seed latent is treated as a center to partition the tokens of $Z$ into $|\mathcal{S}|$ clusters. A repeat operation is then applied to the seed latents, achieving semantic and token count alignment between $Z_\mathcal{S}$ and $Z$.
  • Figure 3: Generation of Seed Points with Multiview Geometry Consistency
  • Figure 4: Ablation study about different seed points geneartion methods: (a) using our method, (b) using Transformer.
  • Figure 5: Without Dimension Alignment, seed-points-driven deformation fails
  • ...and 3 more figures