Table of Contents
Fetching ...

DreamPropeller: Supercharge Text-to-3D Generation with Parallel Sampling

Linqi Zhou, Andy Shih, Chenlin Meng, Stefano Ermon

TL;DR

DreamPropeller introduces a general, drop-in acceleration framework for SDS/VSD-based text-to-3D generation by generalizing Picard iterations to parallelize gradient-driven updates across GPUs. It handles momentum-based optimizers and dynamic-dimensional representations (e.g., Gaussian Splatting) through a generalized drift operator and a fixed-point rollout, achieving up to 4.7x wallclock speedups with negligible quality loss across multiple 3D representations and image-to-3D tasks. The approach includes practical strategies such as a sliding window, deterministic gradients, and EMA-based adaptive thresholds to robustly stabilize convergence. The results demonstrate that DreamPropeller vastly accelerates both text-to-3D and image-to-3D pipelines while preserving semantic and visual fidelity, making high-quality text-to-3D generation more practical for real-world use. The work underscores the value of parallel-in-time computation for complex 3D generation pipelines and provides a framework applicable to a broad class of diffusion-based 3D representations and optimization strategies.

Abstract

Recent methods such as Score Distillation Sampling (SDS) and Variational Score Distillation (VSD) using 2D diffusion models for text-to-3D generation have demonstrated impressive generation quality. However, the long generation time of such algorithms significantly degrades the user experience. To tackle this problem, we propose DreamPropeller, a drop-in acceleration algorithm that can be wrapped around any existing text-to-3D generation pipeline based on score distillation. Our framework generalizes Picard iterations, a classical algorithm for parallel sampling an ODE path, and can account for non-ODE paths such as momentum-based gradient updates and changes in dimensions during the optimization process as in many cases of 3D generation. We show that our algorithm trades parallel compute for wallclock time and empirically achieves up to 4.7x speedup with a negligible drop in generation quality for all tested frameworks.

DreamPropeller: Supercharge Text-to-3D Generation with Parallel Sampling

TL;DR

DreamPropeller introduces a general, drop-in acceleration framework for SDS/VSD-based text-to-3D generation by generalizing Picard iterations to parallelize gradient-driven updates across GPUs. It handles momentum-based optimizers and dynamic-dimensional representations (e.g., Gaussian Splatting) through a generalized drift operator and a fixed-point rollout, achieving up to 4.7x wallclock speedups with negligible quality loss across multiple 3D representations and image-to-3D tasks. The approach includes practical strategies such as a sliding window, deterministic gradients, and EMA-based adaptive thresholds to robustly stabilize convergence. The results demonstrate that DreamPropeller vastly accelerates both text-to-3D and image-to-3D pipelines while preserving semantic and visual fidelity, making high-quality text-to-3D generation more practical for real-world use. The work underscores the value of parallel-in-time computation for complex 3D generation pipelines and provides a framework applicable to a broad class of diffusion-based 3D representations and optimization strategies.

Abstract

Recent methods such as Score Distillation Sampling (SDS) and Variational Score Distillation (VSD) using 2D diffusion models for text-to-3D generation have demonstrated impressive generation quality. However, the long generation time of such algorithms significantly degrades the user experience. To tackle this problem, we propose DreamPropeller, a drop-in acceleration algorithm that can be wrapped around any existing text-to-3D generation pipeline based on score distillation. Our framework generalizes Picard iterations, a classical algorithm for parallel sampling an ODE path, and can account for non-ODE paths such as momentum-based gradient updates and changes in dimensions during the optimization process as in many cases of 3D generation. We show that our algorithm trades parallel compute for wallclock time and empirically achieves up to 4.7x speedup with a negligible drop in generation quality for all tested frameworks.
Paper Structure (19 sections, 15 equations, 6 figures, 4 tables, 2 algorithms)

This paper contains 19 sections, 15 equations, 6 figures, 4 tables, 2 algorithms.

Figures (6)

  • Figure 1: We present two representative examples of applying DreamPropeller. Gray rows denote runtime. Our framework trades parallel compute for speed and achieves more than 4x speedup when applied to both DreamGaussian Tang_undated-odand ProlificDreamer Wang2023-lp while maintaining the generation quality. At the time when DreamPropeller finishes, the baseline versions (Incomplete) exhibit significantly worse appearance and geometry.
  • Figure 2: Picard dependency graph. Gray nodes have outgoing edges to all subsequent nodes in $k+1$-th iteration and are independent of each other. This allows parallel computation of $s({\mathbf{x}}_\tau^k, \tau)$ for all $\tau \in [0,T-1]$.
  • Figure 3: Overview of DreamPropeller. Starting from top left, for iteration $k$, we initialize a window of 3D shapes (in green) with dimension $D$ and dispatch them to $p$ GPUs for parallelly computing the SDS/VSD gradients, which are gathered for rollout using the rule in Eq. \ref{['eq:general-rollout']}. The resulting shapes (in orange) for iteration $k+1$ are compared to those in iteration $k$. The window is slid forward until the error at that time step is not smaller than the threshold $e$, which is adaptively updated with the mean/median error of the window. Optionally, in the case of VSD, we keep independent copies of LoRA diffusion on all GPUs which are updated independently without extra communication.
  • Figure 4: Visual comparisons. Methods using DreamPropeller achieve equally high-quality generation with a much shorter runtime.
  • Figure 5: Ablation studies on practical choices. Speedup is the ratio of baseline wall-clock runtime to our wall-clock runtime. Relative quality is the ratio of baseline $\text{FID}_\text{CLIP}$ to our $\text{FID}_\text{CLIP}$.
  • ...and 1 more figures