VividDreamer: Invariant Score Distillation For Hyper-Realistic Text-to-3D Generation
Wenjie Zhuo, Fan Ma, Hehe Fan, Yi Yang
TL;DR
This work addresses over-saturation and over-smoothing in Score Distillation Sampling (SDS) for text-to-3D generation by decoupling SDS into a reconstruction term and a classifier-free guidance term. It introduces Invariant Score Distillation (ISD), which replaces the reconstruction term with an invariant score term derived from DDIM sampling, δ_inv = $\epsilon_\phi(z_{t-c};y,t-c) - \epsilon_\phi(z_t;y,t)$, enabling the use of a conventional guidance scale and reducing reconstruction-induced errors. The ISD framework combines δ_inv with the classifier-free guidance term δ_cls = $\epsilon_\phi(z_t;y,t) - \epsilon_\phi(z_t;∅,t)$ using a time-varying weight λ(t) and a fixed guidance weight, preserving detail while avoiding oversaturation. Extensive experiments on text-to-3DGS and text-to-NeRF show single-stage optimization with ISD yields realistic, highly detailed 3D objects and outperforms several baselines in both quantitative CLIP-based metrics and qualitative assessments, while maintaining efficiency and stability.
Abstract
This paper presents Invariant Score Distillation (ISD), a novel method for high-fidelity text-to-3D generation. ISD aims to tackle the over-saturation and over-smoothing problems in Score Distillation Sampling (SDS). In this paper, SDS is decoupled into a weighted sum of two components: the reconstruction term and the classifier-free guidance term. We experimentally found that over-saturation stems from the large classifier-free guidance scale and over-smoothing comes from the reconstruction term. To overcome these problems, ISD utilizes an invariant score term derived from DDIM sampling to replace the reconstruction term in SDS. This operation allows the utilization of a medium classifier-free guidance scale and mitigates the reconstruction-related errors, thus preventing the over-smoothing and over-saturation of results. Extensive experiments demonstrate that our method greatly enhances SDS and produces realistic 3D objects through single-stage optimization.
