DreamCS: Geometry-Aware Text-to-3D Generation with Unpaired 3D Reward Supervision
Xiandong Zou, Ruihao Xia, Hongsong Wang, Pan Zhou
TL;DR
DreamCS tackles the misalignment between human preferences and 3D geometry in text-to-3D generation by introducing 3D-MeshPref, a large-scale unpaired 3D mesh dataset, and RewardCS, a geometry-aware reward model learned via a Cauchy–Schwarz divergence objective $D_{CS}(p(oldsymbol{x})\|p(\boldsymbol{y}))$ with empirical estimate $\hat{D}_{CS}$. It then integrates RewardCS into both implicit and explicit 3D backbones through differentiable meshization and adaptive mesh fusion with progressive reward guidance, formalized as $\mathcal{L}(\psi_t) = \mathcal{L}_{\text{SDS}}(\psi_t) - \alpha(t) \cdot r_{\boldsymbol{\theta}}(d(\psi_t)\mid c)$. On GPTEval3D, DreamCS improves geometric alignment, 3D plausibility, and geometry-text alignment across DreamFusion and MVDream, while reducing Janus artifacts and remaining compatible with 2D-guided baselines. The approach offers a scalable, geometry-focused alternative to 2D reward signals and is poised to enhance real-world text-to-3D asset generation, with code and models to be released publicly.
Abstract
While text-to-3D generation has attracted growing interest, existing methods often struggle to produce 3D assets that align well with human preferences. Current preference alignment techniques for 3D content typically rely on hardly-collected preference-paired multi-view 2D images to train 2D reward models, when then guide 3D generation -- leading to geometric artifacts due to their inherent 2D bias. To address these limitations, we construct 3D-MeshPref, the first large-scale unpaired 3D preference dataset, featuring diverse 3D meshes annotated by a large language model and refined by human evaluators. We then develop RewardCS, the first reward model trained directly on unpaired 3D-MeshPref data using a novel Cauchy-Schwarz divergence objective, enabling effective learning of human-aligned 3D geometric preferences without requiring paired comparisons. Building on this, we propose DreamCS, a unified framework that integrates RewardCS into text-to-3D pipelines -- enhancing both implicit and explicit 3D generation with human preference feedback. Extensive experiments show DreamCS outperforms prior methods, producing 3D assets that are both geometrically faithful and human-preferred. Code and models will be released publicly.
