Table of Contents
Fetching ...

Improved Immiscible Diffusion: Accelerate Diffusion Training by Reducing Its Miscibility

Yiheng Li, Feng Liang, Dan Kondratyuk, Masayoshi Tomizuka, Kurt Keutzer, Chenfeng Xu

TL;DR

This paper identifies trajectory miscibility as a core bottleneck in diffusion model training and proposes immiscible diffusion as a broad, implementation-agnostic approach to reduce mixing of diffusion trajectories across images. It provides theoretical and empirical evidence that denoising remains stable and diverse under immiscible diffusion, and introduces practical implementations—such as KNN noise selection and image scaling—to realize large speedups. Across unconditional and conditional image generation, image editing, and robotics planning tasks, immiscible diffusion yields up to >4x faster training while preserving quality and prompt fidelity. The work also connects optimal transport concepts to diffusion training, broadening the perspective on how to design high-efficiency diffusion systems and suggesting new directions for future research.

Abstract

The substantial training cost of diffusion models hinders their deployment. Immiscible Diffusion recently showed that reducing diffusion trajectory mixing in the noise space via linear assignment accelerates training by simplifying denoising. To extend immiscible diffusion beyond the inefficient linear assignment under high batch sizes and high dimensions, we refine this concept to a broader miscibility reduction at any layer and by any implementation. Specifically, we empirically demonstrate the bijective nature of the denoising process with respect to immiscible diffusion, ensuring its preservation of generative diversity. Moreover, we provide thorough analysis and show step-by-step how immiscibility eases denoising and improves efficiency. Extending beyond linear assignment, we propose a family of implementations including K-nearest neighbor (KNN) noise selection and image scaling to reduce miscibility, achieving up to >4x faster training across diverse models and tasks including unconditional/conditional generation, image editing, and robotics planning. Furthermore, our analysis of immiscibility offers a novel perspective on how optimal transport (OT) enhances diffusion training. By identifying trajectory miscibility as a fundamental bottleneck, we believe this work establishes a potentially new direction for future research into high-efficiency diffusion training. The code is available at https://github.com/yhli123/Immiscible-Diffusion.

Improved Immiscible Diffusion: Accelerate Diffusion Training by Reducing Its Miscibility

TL;DR

This paper identifies trajectory miscibility as a core bottleneck in diffusion model training and proposes immiscible diffusion as a broad, implementation-agnostic approach to reduce mixing of diffusion trajectories across images. It provides theoretical and empirical evidence that denoising remains stable and diverse under immiscible diffusion, and introduces practical implementations—such as KNN noise selection and image scaling—to realize large speedups. Across unconditional and conditional image generation, image editing, and robotics planning tasks, immiscible diffusion yields up to >4x faster training while preserving quality and prompt fidelity. The work also connects optimal transport concepts to diffusion training, broadening the perspective on how to design high-efficiency diffusion systems and suggesting new directions for future research.

Abstract

The substantial training cost of diffusion models hinders their deployment. Immiscible Diffusion recently showed that reducing diffusion trajectory mixing in the noise space via linear assignment accelerates training by simplifying denoising. To extend immiscible diffusion beyond the inefficient linear assignment under high batch sizes and high dimensions, we refine this concept to a broader miscibility reduction at any layer and by any implementation. Specifically, we empirically demonstrate the bijective nature of the denoising process with respect to immiscible diffusion, ensuring its preservation of generative diversity. Moreover, we provide thorough analysis and show step-by-step how immiscibility eases denoising and improves efficiency. Extending beyond linear assignment, we propose a family of implementations including K-nearest neighbor (KNN) noise selection and image scaling to reduce miscibility, achieving up to >4x faster training across diverse models and tasks including unconditional/conditional generation, image editing, and robotics planning. Furthermore, our analysis of immiscibility offers a novel perspective on how optimal transport (OT) enhances diffusion training. By identifying trajectory miscibility as a fundamental bottleneck, we believe this work establishes a potentially new direction for future research into high-efficiency diffusion training. The code is available at https://github.com/yhli123/Immiscible-Diffusion.

Paper Structure

This paper contains 28 sections, 2 equations, 11 figures, 5 tables, 1 algorithm.

Figures (11)

  • Figure 1: Improved Immiscible Diffusion Theory. (a) While vanilla diffusion trajectories (flows) are mixed (miscible), each noise point is stably correlated to a specific generated image, making many diffusion trajectories irreversible. (b) Those irreversible trajectories would confuse the denoising process. (c) We introduce immiscible diffusion to cut mixed (miscible) diffusion trajectories during training for accelerating diffusion training.
  • Figure 2: Stable correlation between generated images and its noise origins. Here perturbation means another Gaussian noise added to the fixed original Gaussian noise. Note that even with 20% perturbation, images changes are nearly unnoticeable. Only with 30% perturbation, a image object change happens. These demonstrate stable correlation from a noise area to a specific generated image.
  • Figure 3: Feature analysis of vanilla (miscible) and immiscible DDIM. Referring to DDIM, $\tau=S$ represents the layer denoising from the pure noise. We show that immiscible diffusion activates the noisiest ($\tau\to S$) layers' denoising functions by clarifying their denoising goals, as shown in the $tSNE$ of denoised images across $\tau$'s. Such activation results in FID improvements on the denoised images from large $\tau$'s, which leads to better performance and faster convergence of diffusion models.
  • Figure 4: Implementations of Immiscible Diffusion. (a) Miscible Diffusion pairs the batch of images and noises randomly before adding noise to images. (b)(c)(d) Immiscible Diffusion tries to reduce the miscibility of diffusion by (b) $L_2$ linear assignment between the batch of images and noises and (c) sampling $k$ noises and pick the nearest one (KNN) to use. (d) scaling images by multiplying their pixel values with a constant $>1$, which reduces overlaps between diffuse-able areas of different images.
  • Figure 5: Immiscible diffusion boosts training efficiency. We show the training steps required to reach the best FID for vanilla models across three diverse diffusion-based architectures. Results consistently show that immiscible diffusion trains significantly faster.
  • ...and 6 more figures