Table of Contents
Fetching ...

Text-to-3D with Classifier Score Distillation

Xin Yu, Yuan-Chen Guo, Yangguang Li, Ding Liang, Song-Hai Zhang, Xiaojuan Qi

TL;DR

This work reevaluates the role of classifier-free guidance in score-distillation-based text-to-3D generation and introduces Classifier Score Distillation (CSD), which relies solely on the implicit classifier score from diffusion models. By reframing the optimization around the classifier term, the authors derive enhancements such as annealed negative prompts, text-guided editing, and connections to Variational Score Distillation, achieving state-of-the-art results in 3D generation and texture synthesis with competitive efficiency. Extensive experiments across NeRF-to-mesh pipelines and texture synthesis demonstrate improved alignment with prompts, realistic appearances, and favorable user studies. The work provides a new, practical perspective on diffusion priors in 3D generation and suggests promising avenues for future distribution-based objectives and 2D-to-3D consistency analysis.

Abstract

Text-to-3D generation has made remarkable progress recently, particularly with methods based on Score Distillation Sampling (SDS) that leverages pre-trained 2D diffusion models. While the usage of classifier-free guidance is well acknowledged to be crucial for successful optimization, it is considered an auxiliary trick rather than the most essential component. In this paper, we re-evaluate the role of classifier-free guidance in score distillation and discover a surprising finding: the guidance alone is enough for effective text-to-3D generation tasks. We name this method Classifier Score Distillation (CSD), which can be interpreted as using an implicit classification model for generation. This new perspective reveals new insights for understanding existing techniques. We validate the effectiveness of CSD across a variety of text-to-3D tasks including shape generation, texture synthesis, and shape editing, achieving results superior to those of state-of-the-art methods. Our project page is https://xinyu-andy.github.io/Classifier-Score-Distillation

Text-to-3D with Classifier Score Distillation

TL;DR

This work reevaluates the role of classifier-free guidance in score-distillation-based text-to-3D generation and introduces Classifier Score Distillation (CSD), which relies solely on the implicit classifier score from diffusion models. By reframing the optimization around the classifier term, the authors derive enhancements such as annealed negative prompts, text-guided editing, and connections to Variational Score Distillation, achieving state-of-the-art results in 3D generation and texture synthesis with competitive efficiency. Extensive experiments across NeRF-to-mesh pipelines and texture synthesis demonstrate improved alignment with prompts, realistic appearances, and favorable user studies. The work provides a new, practical perspective on diffusion priors in 3D generation and suggests promising avenues for future distribution-based objectives and 2D-to-3D consistency analysis.

Abstract

Text-to-3D generation has made remarkable progress recently, particularly with methods based on Score Distillation Sampling (SDS) that leverages pre-trained 2D diffusion models. While the usage of classifier-free guidance is well acknowledged to be crucial for successful optimization, it is considered an auxiliary trick rather than the most essential component. In this paper, we re-evaluate the role of classifier-free guidance in score distillation and discover a surprising finding: the guidance alone is enough for effective text-to-3D generation tasks. We name this method Classifier Score Distillation (CSD), which can be interpreted as using an implicit classification model for generation. This new perspective reveals new insights for understanding existing techniques. We validate the effectiveness of CSD across a variety of text-to-3D tasks including shape generation, texture synthesis, and shape editing, achieving results superior to those of state-of-the-art methods. Our project page is https://xinyu-andy.github.io/Classifier-Score-Distillation
Paper Structure (26 sections, 14 equations, 6 figures, 2 tables)

This paper contains 26 sections, 14 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: (a) The gradient norm during optimization. (b) Optimization results through different guidance weights.
  • Figure 2: Qualitative comparisons to baselines for text-to-3D generation. Our method can generate 3D scenes that align well with input text prompts with realistic and detailed appearances.
  • Figure 3: Qualitative comparisons to baselines for text-guided texture synthesis on 3D meshes. Our method generates more detailed and photo-realistic textures.
  • Figure 4: Ablation study on negative prompts and annealed negative classifier scores.
  • Figure 5: Demonstration of CSD in text-guided 3D editing Tasks. Our method effectively modifies attributes based on the given prompt while faithfully preserving the remaining features.
  • ...and 1 more figures