Table of Contents
Fetching ...

SteinDreamer: Variance Reduction for Text-to-3D Score Distillation via Stein Identity

Peihao Wang, Zhiwen Fan, Dejia Xu, Dilin Wang, Sreyas Mohan, Forrest Iandola, Rakesh Ranjan, Yilei Li, Qiang Liu, Zhangyang Wang, Vikas Chandra

TL;DR

This work tackles high-variance gradient estimates in text-to-3D score distillation by reinterpreting SDS and VSD through the lens of control variates and introducing Stein Score Distillation (SSD) based on Stein's identity. SSD allows arbitrary, zero-mean baseline functions, enabling priors such as monocular depth to be integrated into the distillation objective, and includes a learnable weighting mechanism to minimize gradient variance. The authors instantiate SteinDreamer with a monocular depth/normal prior, showing reduced variance, improved texture and geometry, and faster convergence than DreamFusion and ProlificDreamer across object and scene generation. The results suggest that explicit variance control via Stein-based baselines yields practical gains in speed and quality for text-guided 3D synthesis, with broad potential for incorporating additional priors and architectures.

Abstract

Score distillation has emerged as one of the most prevalent approaches for text-to-3D asset synthesis. Essentially, score distillation updates 3D parameters by lifting and back-propagating scores averaged over different views. In this paper, we reveal that the gradient estimation in score distillation is inherent to high variance. Through the lens of variance reduction, the effectiveness of SDS and VSD can be interpreted as applications of various control variates to the Monte Carlo estimator of the distilled score. Motivated by this rethinking and based on Stein's identity, we propose a more general solution to reduce variance for score distillation, termed Stein Score Distillation (SSD). SSD incorporates control variates constructed by Stein identity, allowing for arbitrary baseline functions. This enables us to include flexible guidance priors and network architectures to explicitly optimize for variance reduction. In our experiments, the overall pipeline, dubbed SteinDreamer, is implemented by instantiating the control variate with a monocular depth estimator. The results suggest that SSD can effectively reduce the distillation variance and consistently improve visual quality for both object- and scene-level generation. Moreover, we demonstrate that SteinDreamer achieves faster convergence than existing methods due to more stable gradient updates.

SteinDreamer: Variance Reduction for Text-to-3D Score Distillation via Stein Identity

TL;DR

This work tackles high-variance gradient estimates in text-to-3D score distillation by reinterpreting SDS and VSD through the lens of control variates and introducing Stein Score Distillation (SSD) based on Stein's identity. SSD allows arbitrary, zero-mean baseline functions, enabling priors such as monocular depth to be integrated into the distillation objective, and includes a learnable weighting mechanism to minimize gradient variance. The authors instantiate SteinDreamer with a monocular depth/normal prior, showing reduced variance, improved texture and geometry, and faster convergence than DreamFusion and ProlificDreamer across object and scene generation. The results suggest that explicit variance control via Stein-based baselines yields practical gains in speed and quality for text-guided 3D synthesis, with broad potential for incorporating additional priors and architectures.

Abstract

Score distillation has emerged as one of the most prevalent approaches for text-to-3D asset synthesis. Essentially, score distillation updates 3D parameters by lifting and back-propagating scores averaged over different views. In this paper, we reveal that the gradient estimation in score distillation is inherent to high variance. Through the lens of variance reduction, the effectiveness of SDS and VSD can be interpreted as applications of various control variates to the Monte Carlo estimator of the distilled score. Motivated by this rethinking and based on Stein's identity, we propose a more general solution to reduce variance for score distillation, termed Stein Score Distillation (SSD). SSD incorporates control variates constructed by Stein identity, allowing for arbitrary baseline functions. This enables us to include flexible guidance priors and network architectures to explicitly optimize for variance reduction. In our experiments, the overall pipeline, dubbed SteinDreamer, is implemented by instantiating the control variate with a monocular depth estimator. The results suggest that SSD can effectively reduce the distillation variance and consistently improve visual quality for both object- and scene-level generation. Moreover, we demonstrate that SteinDreamer achieves faster convergence than existing methods due to more stable gradient updates.
Paper Structure (38 sections, 19 equations, 12 figures, 4 tables)

This paper contains 38 sections, 19 equations, 12 figures, 4 tables.

Figures (12)

  • Figure 1: A gallery of text-to-3D results generated by SteinDreamer. Our method can synthesize large-scale scenes with smooth geometries and rich textures according to complex text prompts. Zoom in for the best view.
  • Figure 2: Comparison between SDS, VSD, and SSD on 2D space. We monitor the variance of $\boldsymbol{\Delta}_{SDS}$$\boldsymbol{\Delta}_{VSD}$, and $\boldsymbol{\Delta}_{SSD}$ for every 100 training step. We show that variance level is highly correlated to the final performance and convergence speed.
  • Figure 3: Pipeline of SteinDreamer. We incorporate control variates constructed by Stein's identity into a score distillation pipeline, allowing for arbitrary baseline functions. In practice, we implement the baseline functions with a monocular depth or normal estimator.
  • Figure 4: Scene-level qualitative comparisons. Compared to existing methods, SteinDreamer w/ normal estimator presents more realistic textures with better details.
  • Figure 5: Object-level qualitative comparisons. Compared to existing methods, our SteinDreamer w/ normal estimator delivers smoother geometry, more detailed texture, and fewer floater artifacts.
  • ...and 7 more figures