Table of Contents
Fetching ...

DreamLCM: Towards High-Quality Text-to-3D Generation via Latent Consistency Model

Yiming Zhong, Xiaolin Zhang, Yao Zhao, Yunchao Wei

TL;DR

This work addresses the persistent over-smoothness in SDS-based text-to-3D generation by introducing DreamLCM, which leverages the Latent Consistency Model to generate high-quality, consistent guidance in a single-step inference. Two strategies—Guidance Calibration using an Euler Solver and a Dual Timestep Strategy—enhance convergence and enable separate optimization of geometry and appearance for Gaussian Splatting representations. DreamLCM maintains the SDS loss while delivering superior detail and training efficiency, outperforming prior methods in both generation quality and cost. The approach offers a practical path toward more reliable, high-fidelity text-to-3D synthesis with end-to-end training and accessible code.

Abstract

Recently, the text-to-3D task has developed rapidly due to the appearance of the SDS method. However, the SDS method always generates 3D objects with poor quality due to the over-smooth issue. This issue is attributed to two factors: 1) the DDPM single-step inference produces poor guidance gradients; 2) the randomness from the input noises and timesteps averages the details of the 3D contents. In this paper, to address the issue, we propose DreamLCM which incorporates the Latent Consistency Model (LCM). DreamLCM leverages the powerful image generation capabilities inherent in LCM, enabling generating consistent and high-quality guidance, i.e., predicted noises or images. Powered by the improved guidance, the proposed method can provide accurate and detailed gradients to optimize the target 3D models. In addition, we propose two strategies to enhance the generation quality further. Firstly, we propose a guidance calibration strategy, utilizing Euler Solver to calibrate the guidance distribution to accelerate 3D models to converge. Secondly, we propose a dual timestep strategy, increasing the consistency of guidance and optimizing 3D models from geometry to appearance in DreamLCM. Experiments show that DreamLCM achieves state-of-the-art results in both generation quality and training efficiency. The code is available at https://github.com/1YimingZhong/DreamLCM.

DreamLCM: Towards High-Quality Text-to-3D Generation via Latent Consistency Model

TL;DR

This work addresses the persistent over-smoothness in SDS-based text-to-3D generation by introducing DreamLCM, which leverages the Latent Consistency Model to generate high-quality, consistent guidance in a single-step inference. Two strategies—Guidance Calibration using an Euler Solver and a Dual Timestep Strategy—enhance convergence and enable separate optimization of geometry and appearance for Gaussian Splatting representations. DreamLCM maintains the SDS loss while delivering superior detail and training efficiency, outperforming prior methods in both generation quality and cost. The approach offers a practical path toward more reliable, high-fidelity text-to-3D synthesis with end-to-end training and accessible code.

Abstract

Recently, the text-to-3D task has developed rapidly due to the appearance of the SDS method. However, the SDS method always generates 3D objects with poor quality due to the over-smooth issue. This issue is attributed to two factors: 1) the DDPM single-step inference produces poor guidance gradients; 2) the randomness from the input noises and timesteps averages the details of the 3D contents. In this paper, to address the issue, we propose DreamLCM which incorporates the Latent Consistency Model (LCM). DreamLCM leverages the powerful image generation capabilities inherent in LCM, enabling generating consistent and high-quality guidance, i.e., predicted noises or images. Powered by the improved guidance, the proposed method can provide accurate and detailed gradients to optimize the target 3D models. In addition, we propose two strategies to enhance the generation quality further. Firstly, we propose a guidance calibration strategy, utilizing Euler Solver to calibrate the guidance distribution to accelerate 3D models to converge. Secondly, we propose a dual timestep strategy, increasing the consistency of guidance and optimizing 3D models from geometry to appearance in DreamLCM. Experiments show that DreamLCM achieves state-of-the-art results in both generation quality and training efficiency. The code is available at https://github.com/1YimingZhong/DreamLCM.
Paper Structure (22 sections, 12 equations, 9 figures, 1 algorithm)

This paper contains 22 sections, 12 equations, 9 figures, 1 algorithm.

Figures (9)

  • Figure 1: Illustration of different guidance generation approaches. $x$ and $\hat{\epsilon}$ indicates the rendered image and the guidance, respectively. (a) SDS generates guidance via a single diffusion model while producing over-smooth results. (b) LucidDreamer utilizes the DDIM inversion technique, forwarding Diffusion Models multiple times where $N=\{2,3,4,5\}$. (c) The proposed DreamLCM method incorporates LCM as the guidance model. We also propose a guidance calibration strategy that uses Euler Solver to refine the guidance $\hat{\epsilon_0^s}$ to $\hat{\epsilon_0^t}$. Our method generates higher-quality guidance compared to (a) and (b).
  • Figure 2: Illustration of DreamLCM. DreamLCM initializes the 3D model $\theta$ via text-to-3D generator nichol2022pointejun2023shape. We utilize the proposed timestep strategy to divide the training into two phases. In the initial phase, we directly generate guidance via a single LCM network. In the refinement phase, we utilize another LCM network and an Euler Solver to calibrate the guidance. We calculate the original SDS loss to update $\theta$.
  • Figure 3: Examples generated by DreamLCM. We incorporate the Latent Consistency Model (LCM) as a guidance model, with two proposed strategies to further enhance the generation quality (See section \ref{['sec:method']} for details). DreamLCM generates high-quality results with fine details.
  • Figure 4: Comparison with the state-of-the-art text-to-3D generation methods with Gaussian Splatting as 3D representations. Experiments show that the proposed DreamLCM generates photo-realistic 3D objects with high quality and fine details. The models generated by DreamLCM are more consistent with the text prompt. The training time is measured with a single RTX 3090 GPU.
  • Figure 5: Ablation Study of DreamLCM. The proposed components are effective and can improve the text-to-3D generation quality. (1) The results of SDS with a large CFG scale of 100. (2) We incorporate LCM as a guidance model with a small CFG of 7.5. (3)(4) The results after adding the Dual Timestep Strategy. It includes two parts, the Decreasing Timestep Strategy to reduce the randomness in timesteps and the Two-phases Strategy to improve geometry. Both parts are effective. (5) The results adding Guidance Calibration from column (2). (6)The results after adding Guidance Calibration and Dual Timestep Strategy. (7) We use fixed noise to perturb the samples to reduce the randomness in noises to improve the details. We highlight some improved details in cyan. The prompts corresponding to the four examples are "a green dragon breathing fire", "a squirrel in samurai armor wielding a katana", "a delicious hamburger" and "A warrior with red cape riding a horse".
  • ...and 4 more figures