Table of Contents
Fetching ...

Diverse Score Distillation

Yanbo Xu, Jayanth Srinivasa, Gaowen Liu, Shubham Tulsiani

TL;DR

Diverse Score Distillation (DSD) tackles the limited diversity of diffusion-guided 3D optimization by enforcing optimization to follow diffusion-sampling trajectories starting from different seeds, inspired by the PF-ODE view of denoising diffusion models. To accommodate drift when 3D renderings cannot perfectly track 2D diffusion paths, DSD introduces an interpolation-based correction that reconstructs trajectories in terms of a current render and stochastic noise, enabling both fidelity and explicit diversity. Empirically, DSD yields higher-fidelity yet more diverse 3D outputs across 2D optimization, text-to-3D generation, and single-view reconstruction, outperforming SDS, ASD, SDI, and consistent-ODE baselines in both quality and variety, while remaining compatible with standard CFG settings. The approach broadens the practical utility of diffusion priors for 3D content creation and paves the way for more diverse, multi-view, and multi-modal 3D generation pipelines.

Abstract

Score distillation of 2D diffusion models has proven to be a powerful mechanism to guide 3D optimization, for example enabling text-based 3D generation or single-view reconstruction. A common limitation of existing score distillation formulations, however, is that the outputs of the (mode-seeking) optimization are limited in diversity despite the underlying diffusion model being capable of generating diverse samples. In this work, inspired by the sampling process in denoising diffusion, we propose a score formulation that guides the optimization to follow generation paths defined by random initial seeds, thus ensuring diversity. We then present an approximation to adopt this formulation for scenarios where the optimization may not precisely follow the generation paths (\eg a 3D representation whose renderings evolve in a co-dependent manner). We showcase the applications of our `Diverse Score Distillation' (DSD) formulation across tasks such as 2D optimization, text-based 3D inference, and single-view reconstruction. We also empirically validate DSD against prior score distillation formulations and show that it significantly improves sample diversity while preserving fidelity.

Diverse Score Distillation

TL;DR

Diverse Score Distillation (DSD) tackles the limited diversity of diffusion-guided 3D optimization by enforcing optimization to follow diffusion-sampling trajectories starting from different seeds, inspired by the PF-ODE view of denoising diffusion models. To accommodate drift when 3D renderings cannot perfectly track 2D diffusion paths, DSD introduces an interpolation-based correction that reconstructs trajectories in terms of a current render and stochastic noise, enabling both fidelity and explicit diversity. Empirically, DSD yields higher-fidelity yet more diverse 3D outputs across 2D optimization, text-to-3D generation, and single-view reconstruction, outperforming SDS, ASD, SDI, and consistent-ODE baselines in both quality and variety, while remaining compatible with standard CFG settings. The approach broadens the practical utility of diffusion priors for 3D content creation and paves the way for more diverse, multi-view, and multi-modal 3D generation pipelines.

Abstract

Score distillation of 2D diffusion models has proven to be a powerful mechanism to guide 3D optimization, for example enabling text-based 3D generation or single-view reconstruction. A common limitation of existing score distillation formulations, however, is that the outputs of the (mode-seeking) optimization are limited in diversity despite the underlying diffusion model being capable of generating diverse samples. In this work, inspired by the sampling process in denoising diffusion, we propose a score formulation that guides the optimization to follow generation paths defined by random initial seeds, thus ensuring diversity. We then present an approximation to adopt this formulation for scenarios where the optimization may not precisely follow the generation paths (\eg a 3D representation whose renderings evolve in a co-dependent manner). We showcase the applications of our `Diverse Score Distillation' (DSD) formulation across tasks such as 2D optimization, text-based 3D inference, and single-view reconstruction. We also empirically validate DSD against prior score distillation formulations and show that it significantly improves sample diversity while preserving fidelity.

Paper Structure

This paper contains 19 sections, 16 equations, 12 figures, 3 tables, 4 algorithms.

Figures (12)

  • Figure 1: Diverse Score Distillation. We present a sampling-inspired score distillation formulation that allows obtaining diverse (3D) outputs via different initial optimization seeds. * "A DSLR photo of".
  • Figure 2: DDIM ODE Trajectory. When noisy image $\mathbf{x}(t)$ is sampled along a DDIM ODE trajectory, there is an induced process in the one-step prediction space $\mathbf{x}_0(t)$.
  • Figure 3: Overview of DSD. A unique ODE starting point $\epsilon^*$ is assigned to each 3D shape throughout the optimization process. Renderings from different views are assumed to be on the view-conditioned ODE, starting from $\epsilon^*$. At each iteration, DSD simulates the corresponding ODE up to time $t$ and obtains noise prediction $\epsilon(t)$ from the ODE. The rendered view is connected to the ODE by an interpolation approximation, which is then used to obtain the gradient.
  • Figure 4: 2D Distillation Results using DDIM as reference. The prompts are "A hamburger", "A dragon with flames coming out of its mouth" and "An apple". We assign a fixed initial noise for each grid. When the number of DDIM steps equals the optimization step, our method (DSD*) resembles the DDIM sample. When different, the optimized image will deviate from the original ODE by a small margin. We observe that DSD yields more diverse and plausible generations compared to alternates.
  • Figure 5: Generation Comparison. We visualize 4 text-to-3D generations from various score distillation methods. We find that DSD is capable of generating high-quality 3D shapes while being more diverse compared to prior methods. Please see supplementary for videos.
  • ...and 7 more figures