Table of Contents
Fetching ...

Harnessing the Power of Training-Free Techniques in Text-to-2D Generation for Text-to-3D Generation via Score Distillation Sampling

Junhong Lee, Seungwook Kim, Minsu Cho

TL;DR

This work addresses the gap in understanding how training-free diffusion guidance techniques, notably Classifier-Free Guidance (CFG) and FreeU, affect Score Distillation Sampling (SDS) used for text-to-3D generation via 2D lifting. It introduces a dynamic scaling scheme that schedules FreeU by diffusion timesteps and CFG by optimization iterations, enabling a balanced improvement of texture detail, surface smoothness, and geometric stability in 3D outputs. The approach is validated across SDS-based pipelines (including MVDream, DreamFusion, and Magic3D) and through a user study, showing improved perceived quality and CLIP-consistent results, with generalization to other SDS methods. The findings highlight practical implications for designing training-free interventions in diffusion-based 3D generation, offering a path toward high-fidelity multi-view 3D content with manageable artifacts.

Abstract

Recent studies show that simple training-free techniques can dramatically improve the quality of text-to-2D generation outputs, e.g. Classifier-Free Guidance (CFG) or FreeU. However, these training-free techniques have been underexplored in the lens of Score Distillation Sampling (SDS), which is a popular and effective technique to leverage the power of pretrained text-to-2D diffusion models for various tasks. In this paper, we aim to shed light on the effect such training-free techniques have on SDS, via a particular application of text-to-3D generation via 2D lifting. We present our findings, which show that varying the scales of CFG presents a trade-off between object size and surface smoothness, while varying the scales of FreeU presents a trade-off between texture details and geometric errors. Based on these findings, we provide insights into how we can effectively harness training-free techniques for SDS, via a strategic scaling of such techniques in a dynamic manner with respect to the timestep or optimization iteration step. We show that using our proposed scheme strikes a favorable balance between texture details and surface smoothness in text-to-3D generations, while preserving the size of the output and mitigating the occurrence of geometric defects.

Harnessing the Power of Training-Free Techniques in Text-to-2D Generation for Text-to-3D Generation via Score Distillation Sampling

TL;DR

This work addresses the gap in understanding how training-free diffusion guidance techniques, notably Classifier-Free Guidance (CFG) and FreeU, affect Score Distillation Sampling (SDS) used for text-to-3D generation via 2D lifting. It introduces a dynamic scaling scheme that schedules FreeU by diffusion timesteps and CFG by optimization iterations, enabling a balanced improvement of texture detail, surface smoothness, and geometric stability in 3D outputs. The approach is validated across SDS-based pipelines (including MVDream, DreamFusion, and Magic3D) and through a user study, showing improved perceived quality and CLIP-consistent results, with generalization to other SDS methods. The findings highlight practical implications for designing training-free interventions in diffusion-based 3D generation, offering a path toward high-fidelity multi-view 3D content with manageable artifacts.

Abstract

Recent studies show that simple training-free techniques can dramatically improve the quality of text-to-2D generation outputs, e.g. Classifier-Free Guidance (CFG) or FreeU. However, these training-free techniques have been underexplored in the lens of Score Distillation Sampling (SDS), which is a popular and effective technique to leverage the power of pretrained text-to-2D diffusion models for various tasks. In this paper, we aim to shed light on the effect such training-free techniques have on SDS, via a particular application of text-to-3D generation via 2D lifting. We present our findings, which show that varying the scales of CFG presents a trade-off between object size and surface smoothness, while varying the scales of FreeU presents a trade-off between texture details and geometric errors. Based on these findings, we provide insights into how we can effectively harness training-free techniques for SDS, via a strategic scaling of such techniques in a dynamic manner with respect to the timestep or optimization iteration step. We show that using our proposed scheme strikes a favorable balance between texture details and surface smoothness in text-to-3D generations, while preserving the size of the output and mitigating the occurrence of geometric defects.

Paper Structure

This paper contains 24 sections, 9 equations, 15 figures, 4 tables.

Figures (15)

  • Figure 1: FreeU on multi-view diffusion. Results for the prompt 'a DSLR photo of a frog wearing a sweater, 3D asset.' From bottom to top: scaling only backbone features, scaling both backbone and skip features, scaling with values symmetric to $1.0$, and no scaling. Each scaling follows values suggested by FreeU.
  • Figure 2: FreeU on Score Distillation Sampling These are generated from the prompt 'Dragon armor, 3D asset.' While FreeU scaling enhances detail capture in SDS, it also increases the risk of geometric defects, such as body distortion as above. Our proposed dynamic scaling technique preserves detail while avoiding geometric issues. Similar to its effect in diffusion, FreeU scaling in SDS shows that skip feature scaling does not significantly contribute to quality improvement.
  • Figure 3: CFG on Score Distillation Sampling The guidance weights, which serve as CFG scaling variables, are 50(origin), 10(small CFG), 100(large CFG), and 100 to 10(dynamic scaling) from left to right. Scaling with larger values increases the object's size and roughens the surface, and vice versa. Our dynamic scaling technique allows the object to maintain its size while achieving a surface smoothness comparable to when the guidance weight is 10.
  • Figure 4: Samples generated by score distillation with or without dynamic scaling training-free techniques.
  • Figure 5: Effects of harnessing training-free techniques on DreamFusion and Magic3D
  • ...and 10 more figures