Table of Contents
Fetching ...

DreamReward: Text-to-3D Generation with Human Preference

Junliang Ye, Fangfu Liu, Qixiu Li, Zhengyi Wang, Yikai Wang, Xinzhou Wang, Yueqi Duan, Jun Zhu

TL;DR

DreamReward introduces Reward3D, a 3D-aware human-preference reward model, and DreamFL, a reward-guided fine-tuning method that aligns multi-view 3D generation with human judgments. By building a labeled 3D dataset and a scalable preference model, the approach enables direct optimization of diffusion-based 3D generation toward human-like aesthetics and prompt fidelity. Extensive experiments show DreamFL and Reward3D outperform baselines on alignment, quality, and consistency, with Reward3D serving as a lightweight automatic evaluator. The framework demonstrates the potential of learning from human feedback to close the gap between text prompts and high-quality, view-consistent 3D outputs in diffusion-driven 3D synthesis.

Abstract

3D content creation from text prompts has shown remarkable success recently. However, current text-to-3D methods often generate 3D results that do not align well with human preferences. In this paper, we present a comprehensive framework, coined DreamReward, to learn and improve text-to-3D models from human preference feedback. To begin with, we collect 25k expert comparisons based on a systematic annotation pipeline including rating and ranking. Then, we build Reward3D -- the first general-purpose text-to-3D human preference reward model to effectively encode human preferences. Building upon the 3D reward model, we finally perform theoretical analysis and present the Reward3D Feedback Learning (DreamFL), a direct tuning algorithm to optimize the multi-view diffusion models with a redefined scorer. Grounded by theoretical proof and extensive experiment comparisons, our DreamReward successfully generates high-fidelity and 3D consistent results with significant boosts in prompt alignment with human intention. Our results demonstrate the great potential for learning from human feedback to improve text-to-3D models.

DreamReward: Text-to-3D Generation with Human Preference

TL;DR

DreamReward introduces Reward3D, a 3D-aware human-preference reward model, and DreamFL, a reward-guided fine-tuning method that aligns multi-view 3D generation with human judgments. By building a labeled 3D dataset and a scalable preference model, the approach enables direct optimization of diffusion-based 3D generation toward human-like aesthetics and prompt fidelity. Extensive experiments show DreamFL and Reward3D outperform baselines on alignment, quality, and consistency, with Reward3D serving as a lightweight automatic evaluator. The framework demonstrates the potential of learning from human feedback to close the gap between text prompts and high-quality, view-consistent 3D outputs in diffusion-driven 3D synthesis.

Abstract

3D content creation from text prompts has shown remarkable success recently. However, current text-to-3D methods often generate 3D results that do not align well with human preferences. In this paper, we present a comprehensive framework, coined DreamReward, to learn and improve text-to-3D models from human preference feedback. To begin with, we collect 25k expert comparisons based on a systematic annotation pipeline including rating and ranking. Then, we build Reward3D -- the first general-purpose text-to-3D human preference reward model to effectively encode human preferences. Building upon the 3D reward model, we finally perform theoretical analysis and present the Reward3D Feedback Learning (DreamFL), a direct tuning algorithm to optimize the multi-view diffusion models with a redefined scorer. Grounded by theoretical proof and extensive experiment comparisons, our DreamReward successfully generates high-fidelity and 3D consistent results with significant boosts in prompt alignment with human intention. Our results demonstrate the great potential for learning from human feedback to improve text-to-3D models.
Paper Structure (41 sections, 14 equations, 12 figures, 4 tables, 1 algorithm)

This paper contains 41 sections, 14 equations, 12 figures, 4 tables, 1 algorithm.

Figures (12)

  • Figure 1: The overall framework of our DreamReward. (Top) Reward3D involves data collection, annotation, and preference learning. (Bottom) DreamFL utilizes feedback from Reward3D to compute RewardLoss and incorporate it into the SDS loss for simultaneous optimization of NeRF.
  • Figure 2: Representative examples from our constructed 3D dataset, along with the scores assigned by Reward3D. Reward3D gives lower scores to 3D assets deviating from the prompt description.
  • Figure 3: The utilization of Reward3D in scoring both positive examples and negative examples (left: inconsistency, right: multi-face issue) reveals that the model can effectively distinguish negative examples.
  • Figure 4: Comparison with four baselines. The results indicate that existing 3D generation models do not align well with human preferences (as highlighted in red). Conversely, our DreamReward results conform more closely to human preferences.
  • Figure 5: More generated results using our DreamReward. Our work can generate 3D assets of higher alignment, while maintaining consistency across multiple perspectives.
  • ...and 7 more figures