Table of Contents
Fetching ...

PlacidDreamer: Advancing Harmony in Text-to-3D Generation

Shuo Huang, Shikun Sun, Zixuan Wang, Xiaoyu Qin, Yanmin Xiong, Yuan Zhang, Pengfei Wan, Di Zhang, Jia Jia

TL;DR

The PlacidDreamer framework is proposed, a text-to-3D framework that harmonizes initialization, multi-view generation, and text-conditioned generation with a single multi-view diffusion model, while simultaneously employing a novel score distillation algorithm to achieve balanced saturation.

Abstract

Recently, text-to-3D generation has attracted significant attention, resulting in notable performance enhancements. Previous methods utilize end-to-end 3D generation models to initialize 3D Gaussians, multi-view diffusion models to enforce multi-view consistency, and text-to-image diffusion models to refine details with score distillation algorithms. However, these methods exhibit two limitations. Firstly, they encounter conflicts in generation directions since different models aim to produce diverse 3D assets. Secondly, the issue of over-saturation in score distillation has not been thoroughly investigated and solved. To address these limitations, we propose PlacidDreamer, a text-to-3D framework that harmonizes initialization, multi-view generation, and text-conditioned generation with a single multi-view diffusion model, while simultaneously employing a novel score distillation algorithm to achieve balanced saturation. To unify the generation direction, we introduce the Latent-Plane module, a training-friendly plug-in extension that enables multi-view diffusion models to provide fast geometry reconstruction for initialization and enhanced multi-view images to personalize the text-to-image diffusion model. To address the over-saturation problem, we propose to view score distillation as a multi-objective optimization problem and introduce the Balanced Score Distillation algorithm, which offers a Pareto Optimal solution that achieves both rich details and balanced saturation. Extensive experiments validate the outstanding capabilities of our PlacidDreamer. The code is available at \url{https://github.com/HansenHuang0823/PlacidDreamer}.

PlacidDreamer: Advancing Harmony in Text-to-3D Generation

TL;DR

The PlacidDreamer framework is proposed, a text-to-3D framework that harmonizes initialization, multi-view generation, and text-conditioned generation with a single multi-view diffusion model, while simultaneously employing a novel score distillation algorithm to achieve balanced saturation.

Abstract

Recently, text-to-3D generation has attracted significant attention, resulting in notable performance enhancements. Previous methods utilize end-to-end 3D generation models to initialize 3D Gaussians, multi-view diffusion models to enforce multi-view consistency, and text-to-image diffusion models to refine details with score distillation algorithms. However, these methods exhibit two limitations. Firstly, they encounter conflicts in generation directions since different models aim to produce diverse 3D assets. Secondly, the issue of over-saturation in score distillation has not been thoroughly investigated and solved. To address these limitations, we propose PlacidDreamer, a text-to-3D framework that harmonizes initialization, multi-view generation, and text-conditioned generation with a single multi-view diffusion model, while simultaneously employing a novel score distillation algorithm to achieve balanced saturation. To unify the generation direction, we introduce the Latent-Plane module, a training-friendly plug-in extension that enables multi-view diffusion models to provide fast geometry reconstruction for initialization and enhanced multi-view images to personalize the text-to-image diffusion model. To address the over-saturation problem, we propose to view score distillation as a multi-objective optimization problem and introduce the Balanced Score Distillation algorithm, which offers a Pareto Optimal solution that achieves both rich details and balanced saturation. Extensive experiments validate the outstanding capabilities of our PlacidDreamer. The code is available at \url{https://github.com/HansenHuang0823/PlacidDreamer}.
Paper Structure (25 sections, 24 equations, 11 figures, 1 table)

This paper contains 25 sections, 24 equations, 11 figures, 1 table.

Figures (11)

  • Figure 1: 3D generations of PlacidDreamer.
  • Figure 2: (a) The pipeline of PlacidDreamer. (b) Score distillation can be decomposed into two directions: classifier guidance $\delta_\mathrm{CG}$ and smoothing guidance $\delta_\mathrm{SG}$. CSD yu2023text only utilizes classifier guidance. In more than 30% of cases, the angle between these two guidance vectors is obtuse. In such scenarios, using a fixed CFG parameter in SDS may result in negative optimization in the $\delta_\mathrm{SG}$ direction, leading to over-saturation. However, BSD algorithm ensures that each optimization step is non-negative in both directions. (c) The integration of the Latent-Plane module with multi-view diffusion models.
  • Figure 3: 2D generation results of score distillation algorithms, annotated with computational costs. "Forward" represents undergoing one forward process of the diffusion model. The results of our BSD closely resemble the text-to-image ground truth. BSD converges at the Pareto Optimal points, ensuring that its results maintain balanced saturation during over-training.
  • Figure 4: Qualitative comparison with baseline methods. More comparisons with recent methods utilizing NeRF mildenhall2021nerf as 3D representations are provided in Appendix \ref{['sec:appendix_qualitative']}.
  • Figure 5: Results of ablation studies. In the first line, we evaluate PlacidDreamer by removing each component individually. In the second line, we investigate the impact of different $\lambda$ values, validating that BSD enables stable control of the balance between color saturation and detail level.
  • ...and 6 more figures