Diverse Score Distillation
Yanbo Xu, Jayanth Srinivasa, Gaowen Liu, Shubham Tulsiani
TL;DR
Diverse Score Distillation (DSD) tackles the limited diversity of diffusion-guided 3D optimization by enforcing optimization to follow diffusion-sampling trajectories starting from different seeds, inspired by the PF-ODE view of denoising diffusion models. To accommodate drift when 3D renderings cannot perfectly track 2D diffusion paths, DSD introduces an interpolation-based correction that reconstructs trajectories in terms of a current render and stochastic noise, enabling both fidelity and explicit diversity. Empirically, DSD yields higher-fidelity yet more diverse 3D outputs across 2D optimization, text-to-3D generation, and single-view reconstruction, outperforming SDS, ASD, SDI, and consistent-ODE baselines in both quality and variety, while remaining compatible with standard CFG settings. The approach broadens the practical utility of diffusion priors for 3D content creation and paves the way for more diverse, multi-view, and multi-modal 3D generation pipelines.
Abstract
Score distillation of 2D diffusion models has proven to be a powerful mechanism to guide 3D optimization, for example enabling text-based 3D generation or single-view reconstruction. A common limitation of existing score distillation formulations, however, is that the outputs of the (mode-seeking) optimization are limited in diversity despite the underlying diffusion model being capable of generating diverse samples. In this work, inspired by the sampling process in denoising diffusion, we propose a score formulation that guides the optimization to follow generation paths defined by random initial seeds, thus ensuring diversity. We then present an approximation to adopt this formulation for scenarios where the optimization may not precisely follow the generation paths (\eg a 3D representation whose renderings evolve in a co-dependent manner). We showcase the applications of our `Diverse Score Distillation' (DSD) formulation across tasks such as 2D optimization, text-based 3D inference, and single-view reconstruction. We also empirically validate DSD against prior score distillation formulations and show that it significantly improves sample diversity while preserving fidelity.
