Table of Contents
Fetching ...

DreamDPO: Aligning Text-to-3D Generation with Human Preferences via Direct Preference Optimization

Zhenglin Zhou, Xiaobo Xia, Fan Ma, Hehe Fan, Yi Yang, Tat-Seng Chua

TL;DR

DreamDPO tackles misalignment between text-to-3D generation and human preferences by replacing pointwise quality evaluation with direct preference optimization. It builds online pairwise examples using Gaussian-noised renders, ranks them with reward models or large multimodal models, and updates the 3D representation through a piecewise, preference-driven loss that stabilizes training. Empirically, it achieves competitive or superior results versus 13 baselines on GPTEval3D across text-asset alignment, 3D plausibility, and texture-geometry details, while offering improved controllability. The approach reduces dependency on precise absolute scores and opens avenues for explicit guidance via LMMs, making 3D content generation more human-aligned and versatile in practice.

Abstract

Text-to-3D generation automates 3D content creation from textual descriptions, which offers transformative potential across various fields. However, existing methods often struggle to align generated content with human preferences, limiting their applicability and flexibility. To address these limitations, in this paper, we propose DreamDPO, an optimization-based framework that integrates human preferences into the 3D generation process, through direct preference optimization. Practically, DreamDPO first constructs pairwise examples, then compare their alignment with human preferences using reward or large multimodal models, and lastly optimizes the 3D representation with a preference-driven loss function. By leveraging pairwise comparison to reflect preferences, DreamDPO reduces reliance on precise pointwise quality evaluations while enabling fine-grained controllability through preference-guided optimization. Experiments demonstrate that DreamDPO achieves competitive results, and provides higher-quality and more controllable 3D content compared to existing methods. The code and models will be open-sourced.

DreamDPO: Aligning Text-to-3D Generation with Human Preferences via Direct Preference Optimization

TL;DR

DreamDPO tackles misalignment between text-to-3D generation and human preferences by replacing pointwise quality evaluation with direct preference optimization. It builds online pairwise examples using Gaussian-noised renders, ranks them with reward models or large multimodal models, and updates the 3D representation through a piecewise, preference-driven loss that stabilizes training. Empirically, it achieves competitive or superior results versus 13 baselines on GPTEval3D across text-asset alignment, 3D plausibility, and texture-geometry details, while offering improved controllability. The approach reduces dependency on precise absolute scores and opens avenues for explicit guidance via LMMs, making 3D content generation more human-aligned and versatile in practice.

Abstract

Text-to-3D generation automates 3D content creation from textual descriptions, which offers transformative potential across various fields. However, existing methods often struggle to align generated content with human preferences, limiting their applicability and flexibility. To address these limitations, in this paper, we propose DreamDPO, an optimization-based framework that integrates human preferences into the 3D generation process, through direct preference optimization. Practically, DreamDPO first constructs pairwise examples, then compare their alignment with human preferences using reward or large multimodal models, and lastly optimizes the 3D representation with a preference-driven loss function. By leveraging pairwise comparison to reflect preferences, DreamDPO reduces reliance on precise pointwise quality evaluations while enabling fine-grained controllability through preference-guided optimization. Experiments demonstrate that DreamDPO achieves competitive results, and provides higher-quality and more controllable 3D content compared to existing methods. The code and models will be open-sourced.

Paper Structure

This paper contains 23 sections, 7 equations, 12 figures, 1 table, 1 algorithm.

Figures (12)

  • Figure 1: Overview of our method. DreamDPO first constructs pairwise examples, then compares their alignment with human preferences using reward or large multimodal models, and lastly optimizes the 3D presentation with a preference-driven loss function. The loss function pulls the win example $\mathbf{x}_t^{\text{win}}$ closer and pushes the lose example $\mathbf{x}_t^{\text{lose}}$ away. As a piecewise objective, it selectively pushes $\mathbf{x}_t^{\text{lose}}$ only when the preference score gap $s_\text{gap}$ exceeds a threshold $\tau$, preventing chaotic gradients from overly similar $\mathbf{x}_t^{\text{lose}}$.
  • Figure 2: Qualitative comparisons on the benchmark of GPTEval3D wu2024gpt. Existing methods struggle with text matching, as marked in red. DreamDPO improves text matching, which provides better human preference results. (Zoom in to see the details.)
  • Figure 3: Qualitative comparisons with MVDream shi2023mvdream. DreamDPO performs well across short to long prompts, offering better human preference results, marked in red. (Zoom in to see the details.)
  • Figure 4: The analysis of backbone. We present the results of DreamDPO using Stable Diffusion v2.1 (SD2.1) rombach2022high. DreamDPO demonstrates effective performance with SD2.1, highlighting its potential to leverage more advanced backbone diffusion models for further improvements.
  • Figure 5: The analysis of reward models. We present the results of DreamDPO using ImageReward xu2024imagereward. DreamDPO demonstrates effective performance with ImageReward, highlighting its potential to leverage stronger reward models to further enhance generation quality.
  • ...and 7 more figures