DetailGen3D: Generative 3D Geometry Enhancement via Data-Dependent Flow
Ken Deng, Yuan-Chen Guo, Jingxiang Sun, Zi-Xin Zou, Yangguang Li, Xin Cai, Yan-Pei Cao, Yebin Liu, Ding Liang
TL;DR
DetailGen3D tackles the problem of lacking geometric detail in outputs from sparse-view 3D generation by introducing a data-dependent rectified flow that operates in latent space to model coarse-to-fine refinements. It couples this with a token matching mechanism that enforces a one-to-one correspondence between coarse and fine latent codes, and builds high-quality training data from LRM-reconstructed coarse shapes to obtain robust coarse–fine pairs. The approach is implemented as a two-stage system using a 3D-VAE to form latent tokens and a Diffusion Transformer conditioned on image prompts, with a compact and efficient learning objective $L(\theta) = \mathbb{E}_{t,z_0,z_1,y}[\| v_\theta(t,z_t,y) - (z_1 - z_0) \|^2]$ and an inference form $G_{Fine} = \mathcal{F}(G_{Coarse}, I)$. Across reconstruction, generation, and multiple 3D representations, DetailGen3D achieves higher-fidelity detail with efficient training, enabling practical refinement of coarse 3D shapes for applications in design, simulation, and embodied intelligence.
Abstract
Modern 3D generation methods can rapidly create shapes from sparse or single views, but their outputs often lack geometric detail due to computational constraints. We present DetailGen3D, a generative approach specifically designed to enhance these generated 3D shapes. Our key insight is to model the coarse-to-fine transformation directly through data-dependent flows in latent space, avoiding the computational overhead of large-scale 3D generative models. We introduce a token matching strategy that ensures accurate spatial correspondence during refinement, enabling local detail synthesis while preserving global structure. By carefully designing our training data to match the characteristics of synthesized coarse shapes, our method can effectively enhance shapes produced by various 3D generation and reconstruction approaches, from single-view to sparse multi-view inputs. Extensive experiments demonstrate that DetailGen3D achieves high-fidelity geometric detail synthesis while maintaining efficiency in training.
