Dragen3D: Multiview Geometry Consistent 3D Gaussian Generation with Drag-Based Control
Jinbo Yan, Alan Zhao, Yixin Hu
TL;DR
Dragen3D tackles the problem of single-image 3D Gaussian generation with multi-view geometric consistency and intuitive editing. It introduces an Anchor-Gaussian VAE to encode geometry and texture into anchor latents and a Seed-Point-Driven pipeline that maps sparse seed points to these latents via a Seed-Anchor Mapping module, enabling drag-based deformation without relying on 2D diffusion priors. The method achieves state-of-the-art 3DGS quality on Objaverse and Google Scanned Objects while offering interactive editing through seed-point manipulation, and it supports efficient latent-space generation through a coarse-to-fine decoding scheme. This framework promises practical impact for artists and applications in VR, 3D modeling, and content creation by combining geometric fidelity, multi-view consistency, and user-friendly control.
Abstract
Single-image 3D generation has emerged as a prominent research topic, playing a vital role in virtual reality, 3D modeling, and digital content creation. However, existing methods face challenges such as a lack of multi-view geometric consistency and limited controllability during the generation process, which significantly restrict their usability. % To tackle these challenges, we introduce Dragen3D, a novel approach that achieves geometrically consistent and controllable 3D generation leveraging 3D Gaussian Splatting (3DGS). We introduce the Anchor-Gaussian Variational Autoencoder (Anchor-GS VAE), which encodes a point cloud and a single image into anchor latents and decode these latents into 3DGS, enabling efficient latent-space generation. To enable multi-view geometry consistent and controllable generation, we propose a Seed-Point-Driven strategy: first generate sparse seed points as a coarse geometry representation, then map them to anchor latents via the Seed-Anchor Mapping Module. Geometric consistency is ensured by the easily learned sparse seed points, and users can intuitively drag the seed points to deform the final 3DGS geometry, with changes propagated through the anchor latents. To the best of our knowledge, we are the first to achieve geometrically controllable 3D Gaussian generation and editing without relying on 2D diffusion priors, delivering comparable 3D generation quality to state-of-the-art methods.
