DreamPropeller: Supercharge Text-to-3D Generation with Parallel Sampling
Linqi Zhou, Andy Shih, Chenlin Meng, Stefano Ermon
TL;DR
DreamPropeller introduces a general, drop-in acceleration framework for SDS/VSD-based text-to-3D generation by generalizing Picard iterations to parallelize gradient-driven updates across GPUs. It handles momentum-based optimizers and dynamic-dimensional representations (e.g., Gaussian Splatting) through a generalized drift operator and a fixed-point rollout, achieving up to 4.7x wallclock speedups with negligible quality loss across multiple 3D representations and image-to-3D tasks. The approach includes practical strategies such as a sliding window, deterministic gradients, and EMA-based adaptive thresholds to robustly stabilize convergence. The results demonstrate that DreamPropeller vastly accelerates both text-to-3D and image-to-3D pipelines while preserving semantic and visual fidelity, making high-quality text-to-3D generation more practical for real-world use. The work underscores the value of parallel-in-time computation for complex 3D generation pipelines and provides a framework applicable to a broad class of diffusion-based 3D representations and optimization strategies.
Abstract
Recent methods such as Score Distillation Sampling (SDS) and Variational Score Distillation (VSD) using 2D diffusion models for text-to-3D generation have demonstrated impressive generation quality. However, the long generation time of such algorithms significantly degrades the user experience. To tackle this problem, we propose DreamPropeller, a drop-in acceleration algorithm that can be wrapped around any existing text-to-3D generation pipeline based on score distillation. Our framework generalizes Picard iterations, a classical algorithm for parallel sampling an ODE path, and can account for non-ODE paths such as momentum-based gradient updates and changes in dimensions during the optimization process as in many cases of 3D generation. We show that our algorithm trades parallel compute for wallclock time and empirically achieves up to 4.7x speedup with a negligible drop in generation quality for all tested frameworks.
