DreamReward: Text-to-3D Generation with Human Preference
Junliang Ye, Fangfu Liu, Qixiu Li, Zhengyi Wang, Yikai Wang, Xinzhou Wang, Yueqi Duan, Jun Zhu
TL;DR
DreamReward introduces Reward3D, a 3D-aware human-preference reward model, and DreamFL, a reward-guided fine-tuning method that aligns multi-view 3D generation with human judgments. By building a labeled 3D dataset and a scalable preference model, the approach enables direct optimization of diffusion-based 3D generation toward human-like aesthetics and prompt fidelity. Extensive experiments show DreamFL and Reward3D outperform baselines on alignment, quality, and consistency, with Reward3D serving as a lightweight automatic evaluator. The framework demonstrates the potential of learning from human feedback to close the gap between text prompts and high-quality, view-consistent 3D outputs in diffusion-driven 3D synthesis.
Abstract
3D content creation from text prompts has shown remarkable success recently. However, current text-to-3D methods often generate 3D results that do not align well with human preferences. In this paper, we present a comprehensive framework, coined DreamReward, to learn and improve text-to-3D models from human preference feedback. To begin with, we collect 25k expert comparisons based on a systematic annotation pipeline including rating and ranking. Then, we build Reward3D -- the first general-purpose text-to-3D human preference reward model to effectively encode human preferences. Building upon the 3D reward model, we finally perform theoretical analysis and present the Reward3D Feedback Learning (DreamFL), a direct tuning algorithm to optimize the multi-view diffusion models with a redefined scorer. Grounded by theoretical proof and extensive experiment comparisons, our DreamReward successfully generates high-fidelity and 3D consistent results with significant boosts in prompt alignment with human intention. Our results demonstrate the great potential for learning from human feedback to improve text-to-3D models.
