Stable Score Distillation for High-Quality 3D Generation
Boshi Tang, Jianan Wang, Zhiyong Wu, Lei Zhang
TL;DR
This work provides a theoretical framework for SDS by decomposing its estimator into mode-disengaging, mode-seeking, and variance-reducing components, identifying the root causes of over-smoothing and implausibility. It introduces Stable Score Distillation (SSD), a simple, timesteps-aware estimator that combines these terms with adaptive variance reduction to improve 3D content quality while remaining compatible with existing diffusion-based frameworks. The authors validate SSD through numerical simulations and text-to-3D experiments, showing better alignment with prompts, crisper geometry, and richer color, along with extensive ablations and proofs of key properties. The findings offer practical guidance for 3D generation workflows and establish a principled connection between optimization practices and diffusion-based 3D synthesis outcomes.
Abstract
Although Score Distillation Sampling (SDS) has exhibited remarkable performance in conditional 3D content generation, a comprehensive understanding of its formulation is still lacking, hindering the development of 3D generation. In this work, we decompose SDS as a combination of three functional components, namely mode-seeking, mode-disengaging and variance-reducing terms, analyzing the properties of each. We show that problems such as over-smoothness and implausibility result from the intrinsic deficiency of the first two terms and propose a more advanced variance-reducing term than that introduced by SDS. Based on the analysis, we propose a simple yet effective approach named Stable Score Distillation (SSD) which strategically orchestrates each term for high-quality 3D generation and can be readily incorporated to various 3D generation frameworks and 3D representations. Extensive experiments validate the efficacy of our approach, demonstrating its ability to generate high-fidelity 3D content without succumbing to issues such as over-smoothness.
