Table of Contents
Fetching ...

DetailGen3D: Generative 3D Geometry Enhancement via Data-Dependent Flow

Ken Deng, Yuan-Chen Guo, Jingxiang Sun, Zi-Xin Zou, Yangguang Li, Xin Cai, Yan-Pei Cao, Yebin Liu, Ding Liang

TL;DR

DetailGen3D tackles the problem of lacking geometric detail in outputs from sparse-view 3D generation by introducing a data-dependent rectified flow that operates in latent space to model coarse-to-fine refinements. It couples this with a token matching mechanism that enforces a one-to-one correspondence between coarse and fine latent codes, and builds high-quality training data from LRM-reconstructed coarse shapes to obtain robust coarse–fine pairs. The approach is implemented as a two-stage system using a 3D-VAE to form latent tokens and a Diffusion Transformer conditioned on image prompts, with a compact and efficient learning objective $L(\theta) = \mathbb{E}_{t,z_0,z_1,y}[\| v_\theta(t,z_t,y) - (z_1 - z_0) \|^2]$ and an inference form $G_{Fine} = \mathcal{F}(G_{Coarse}, I)$. Across reconstruction, generation, and multiple 3D representations, DetailGen3D achieves higher-fidelity detail with efficient training, enabling practical refinement of coarse 3D shapes for applications in design, simulation, and embodied intelligence.

Abstract

Modern 3D generation methods can rapidly create shapes from sparse or single views, but their outputs often lack geometric detail due to computational constraints. We present DetailGen3D, a generative approach specifically designed to enhance these generated 3D shapes. Our key insight is to model the coarse-to-fine transformation directly through data-dependent flows in latent space, avoiding the computational overhead of large-scale 3D generative models. We introduce a token matching strategy that ensures accurate spatial correspondence during refinement, enabling local detail synthesis while preserving global structure. By carefully designing our training data to match the characteristics of synthesized coarse shapes, our method can effectively enhance shapes produced by various 3D generation and reconstruction approaches, from single-view to sparse multi-view inputs. Extensive experiments demonstrate that DetailGen3D achieves high-fidelity geometric detail synthesis while maintaining efficiency in training.

DetailGen3D: Generative 3D Geometry Enhancement via Data-Dependent Flow

TL;DR

DetailGen3D tackles the problem of lacking geometric detail in outputs from sparse-view 3D generation by introducing a data-dependent rectified flow that operates in latent space to model coarse-to-fine refinements. It couples this with a token matching mechanism that enforces a one-to-one correspondence between coarse and fine latent codes, and builds high-quality training data from LRM-reconstructed coarse shapes to obtain robust coarse–fine pairs. The approach is implemented as a two-stage system using a 3D-VAE to form latent tokens and a Diffusion Transformer conditioned on image prompts, with a compact and efficient learning objective and an inference form . Across reconstruction, generation, and multiple 3D representations, DetailGen3D achieves higher-fidelity detail with efficient training, enabling practical refinement of coarse 3D shapes for applications in design, simulation, and embodied intelligence.

Abstract

Modern 3D generation methods can rapidly create shapes from sparse or single views, but their outputs often lack geometric detail due to computational constraints. We present DetailGen3D, a generative approach specifically designed to enhance these generated 3D shapes. Our key insight is to model the coarse-to-fine transformation directly through data-dependent flows in latent space, avoiding the computational overhead of large-scale 3D generative models. We introduce a token matching strategy that ensures accurate spatial correspondence during refinement, enabling local detail synthesis while preserving global structure. By carefully designing our training data to match the characteristics of synthesized coarse shapes, our method can effectively enhance shapes produced by various 3D generation and reconstruction approaches, from single-view to sparse multi-view inputs. Extensive experiments demonstrate that DetailGen3D achieves high-fidelity geometric detail synthesis while maintaining efficiency in training.

Paper Structure

This paper contains 29 sections, 9 equations, 9 figures, 2 tables, 1 algorithm.

Figures (9)

  • Figure 1: Our method demonstrates effective geometry refinement across various tasks and representations. In the images, the coarse geometry is displayed in gray , while the refined geometry produced by our approach is shown in red . On the right, zoomed-in details are provided to better observe the refinement effects.
  • Figure 2: (1) Inference pipeline. We use 3D-VAE to extract tokens of the coarse geometry generated or reconstructed, then input the coarse token and DINOv2 feature oquab2024dinov2learningrobustvisual of the image prompt to DiT Peebles2022DiT. After the refinement process, we decode the predicted token using a 3D-VAE decoder to obtain refined geometry. The inference process takes only a few seconds. (2) For training data, we use reconstruction results reconstructed by LRM using multi-views rendered from fine geometry as coarse geometry. (3) We demonstrate the token matching process on the left. On the right, for the top one, we only use part query points, which are located in quadrant one, and for the bottom one, we use full query points, which demonstrate that tokens represent the space around the corresponding query points.
  • Figure 3: We apply our method on input meshes reconstructed by different approaches (Instant3D li2023instant3dfasttextto3dsparseview, CRM wang2024crmsingleimage3d, InstantMesh xu2024instantmeshefficient3dmesh). represent coarse, represent fine refinement results from our method. The top three objects are from Objaverse, while the bottom object is from GSO. More results can be found in the supplementary.
  • Figure 4: We apply our method on input meshes generated by different approaches (TripoSR tochilkin2024triposrfast3dobject, CRM wang2024crmsingleimage3d, InstantMesh xu2024instantmeshefficient3dmesh). represent coarse, represent fine refinement results from our method. The top three objects are from Objaverse, while the bottom object is from GSO. More results can be found in the supplementary.
  • Figure 5: We apply our method on input meshes generated by different approaches (TripoSR tochilkin2024triposrfast3dobject, CRM wang2024crmsingleimage3d, InstantMesh xu2024instantmeshefficient3dmesh) using GPTEval3D as input. represent coarse, represent fine refinement results from our method. More results can be found in the supplementary.
  • ...and 4 more figures