Table of Contents
Fetching ...

DreamCS: Geometry-Aware Text-to-3D Generation with Unpaired 3D Reward Supervision

Xiandong Zou, Ruihao Xia, Hongsong Wang, Pan Zhou

TL;DR

DreamCS tackles the misalignment between human preferences and 3D geometry in text-to-3D generation by introducing 3D-MeshPref, a large-scale unpaired 3D mesh dataset, and RewardCS, a geometry-aware reward model learned via a Cauchy–Schwarz divergence objective $D_{CS}(p(oldsymbol{x})\|p(\boldsymbol{y}))$ with empirical estimate $\hat{D}_{CS}$. It then integrates RewardCS into both implicit and explicit 3D backbones through differentiable meshization and adaptive mesh fusion with progressive reward guidance, formalized as $\mathcal{L}(\psi_t) = \mathcal{L}_{\text{SDS}}(\psi_t) - \alpha(t) \cdot r_{\boldsymbol{\theta}}(d(\psi_t)\mid c)$. On GPTEval3D, DreamCS improves geometric alignment, 3D plausibility, and geometry-text alignment across DreamFusion and MVDream, while reducing Janus artifacts and remaining compatible with 2D-guided baselines. The approach offers a scalable, geometry-focused alternative to 2D reward signals and is poised to enhance real-world text-to-3D asset generation, with code and models to be released publicly.

Abstract

While text-to-3D generation has attracted growing interest, existing methods often struggle to produce 3D assets that align well with human preferences. Current preference alignment techniques for 3D content typically rely on hardly-collected preference-paired multi-view 2D images to train 2D reward models, when then guide 3D generation -- leading to geometric artifacts due to their inherent 2D bias. To address these limitations, we construct 3D-MeshPref, the first large-scale unpaired 3D preference dataset, featuring diverse 3D meshes annotated by a large language model and refined by human evaluators. We then develop RewardCS, the first reward model trained directly on unpaired 3D-MeshPref data using a novel Cauchy-Schwarz divergence objective, enabling effective learning of human-aligned 3D geometric preferences without requiring paired comparisons. Building on this, we propose DreamCS, a unified framework that integrates RewardCS into text-to-3D pipelines -- enhancing both implicit and explicit 3D generation with human preference feedback. Extensive experiments show DreamCS outperforms prior methods, producing 3D assets that are both geometrically faithful and human-preferred. Code and models will be released publicly.

DreamCS: Geometry-Aware Text-to-3D Generation with Unpaired 3D Reward Supervision

TL;DR

DreamCS tackles the misalignment between human preferences and 3D geometry in text-to-3D generation by introducing 3D-MeshPref, a large-scale unpaired 3D mesh dataset, and RewardCS, a geometry-aware reward model learned via a Cauchy–Schwarz divergence objective with empirical estimate . It then integrates RewardCS into both implicit and explicit 3D backbones through differentiable meshization and adaptive mesh fusion with progressive reward guidance, formalized as . On GPTEval3D, DreamCS improves geometric alignment, 3D plausibility, and geometry-text alignment across DreamFusion and MVDream, while reducing Janus artifacts and remaining compatible with 2D-guided baselines. The approach offers a scalable, geometry-focused alternative to 2D reward signals and is poised to enhance real-world text-to-3D asset generation, with code and models to be released publicly.

Abstract

While text-to-3D generation has attracted growing interest, existing methods often struggle to produce 3D assets that align well with human preferences. Current preference alignment techniques for 3D content typically rely on hardly-collected preference-paired multi-view 2D images to train 2D reward models, when then guide 3D generation -- leading to geometric artifacts due to their inherent 2D bias. To address these limitations, we construct 3D-MeshPref, the first large-scale unpaired 3D preference dataset, featuring diverse 3D meshes annotated by a large language model and refined by human evaluators. We then develop RewardCS, the first reward model trained directly on unpaired 3D-MeshPref data using a novel Cauchy-Schwarz divergence objective, enabling effective learning of human-aligned 3D geometric preferences without requiring paired comparisons. Building on this, we propose DreamCS, a unified framework that integrates RewardCS into text-to-3D pipelines -- enhancing both implicit and explicit 3D generation with human preference feedback. Extensive experiments show DreamCS outperforms prior methods, producing 3D assets that are both geometrically faithful and human-preferred. Code and models will be released publicly.

Paper Structure

This paper contains 27 sections, 2 theorems, 29 equations, 30 figures, 7 tables.

Key Result

Theorem 1

Suppose Assumption asfdsaf holds. With a constant $C>0$, the empirical CS divergences $\hat{D}_{\mathrm{CS}}^{\mathrm{paired}}$ and $\hat{D}_{\mathrm{CS}}^{\mathrm{unpaired}}$ computed from paired and unpaired data satisfy:

Figures (30)

  • Figure 1: Comparison of 2D- vs. 3D-based reward models.a) 2D reward model ImageReward imagereward assigns high scores to geometrically flawed 3D assets, while our 3D reward model RewardCS better aligns with human preference. b) while DreamFusion dreamfusion guided by 2D Reward3D dreamreward produces 3D assets with geometric defects and the Janus problem, while DreamFusion with our RewardCS yields geometrically consistent 3D content.
  • Figure 2: Annotated score distribution of meshes in 3D-MeshPref.
  • Figure 3: RewardCS is trained on 3D-MeshPref using Cauchy-Schwarz objective.
  • Figure 4: Framework of DreamCS: integrate RewardCS into the SDS model for NeRF optimization.
  • Figure 5: Comparisons with 1-stage generation pipelines (DreamFusion and MVDream) and 2-stage generation pipelines (Magic3D and Fantasia3D). More visualizations are provided in Appendix \ref{['app:vis']}.
  • ...and 25 more figures

Theorems & Definitions (3)

  • Theorem 1: Asymptotic Equivalence
  • Theorem 2: Asymptotic Equivalence
  • proof