BoostDream: Efficient Refining for High-Quality Text-to-3D Generation from Multi-View Diffusion
Yonghao Yu, Shunan Zhu, Huai Qin, Haorui Li
TL;DR
BoostDream presents a three-stage, plug-and-play refinement framework that accelerates high-quality text-to-3D generation by marrying fast feed-forward initialization with multi-view diffusion-based refinement. It introduces a novel 3D representation distillation, a multi-view render system, and a multi-view SDS loss (MV-SDS) with normal-map guidance and orientation/opacity terms, enabling robust refinement across NeRF, DMTet, and 3D Gaussian Splatting representations. The approach addresses the Janus problem, enhances detail through self-guided refinement, and significantly reduces training iterations compared to SDS-only methods, confirmed by extensive experiments, ablations, and user studies. This work offers a practical path to efficient, high-fidelity 3D asset generation suitable for VR, gaming, and related industries while generalizing across diverse differentiable 3D representations.
Abstract
Witnessing the evolution of text-to-image diffusion models, significant strides have been made in text-to-3D generation. Currently, two primary paradigms dominate the field of text-to-3D: the feed-forward generation solutions, capable of swiftly producing 3D assets but often yielding coarse results, and the Score Distillation Sampling (SDS) based solutions, known for generating high-fidelity 3D assets albeit at a slower pace. The synergistic integration of these methods holds substantial promise for advancing 3D generation techniques. In this paper, we present BoostDream, a highly efficient plug-and-play 3D refining method designed to transform coarse 3D assets into high-quality. The BoostDream framework comprises three distinct processes: (1) We introduce 3D model distillation that fits differentiable representations from the 3D assets obtained through feed-forward generation. (2) A novel multi-view SDS loss is designed, which utilizes a multi-view aware 2D diffusion model to refine the 3D assets. (3) We propose to use prompt and multi-view consistent normal maps as guidance in refinement.Our extensive experiment is conducted on different differentiable 3D representations, revealing that BoostDream excels in generating high-quality 3D assets rapidly, overcoming the Janus problem compared to conventional SDS-based methods. This breakthrough signifies a substantial advancement in both the efficiency and quality of 3D generation processes.
