Table of Contents
Fetching ...

LD-Pruner: Efficient Pruning of Latent Diffusion Models using Task-Agnostic Insights

Thibault Castells, Hyoung-Kyu Song, Bo-Kyeong Kim, Shinkook Choi

TL;DR

LD-Pruner addresses the challenge of deploying Latent Diffusion Models on resource-constrained devices by introducing a task-agnostic, latent-space–guided pruning framework. It collects latent representations under single-operator modifications, defines a task-agnostic operator-importance score as the sum of changes in latent mean and variance, and prunes the least-significant operators while preserving weights to accelerate finetuning, with complexity $O(n m k)$. The method demonstrates substantial inference-speed gains with minimal quality degradation across text-to-image, unconditional image, and unconditional audio tasks, and offers insights into metric design and the value of weight preservation. This work enables practical, efficient deployment of LDMs across modalities by reducing compute and memory without substantial performance loss.

Abstract

Latent Diffusion Models (LDMs) have emerged as powerful generative models, known for delivering remarkable results under constrained computational resources. However, deploying LDMs on resource-limited devices remains a complex issue, presenting challenges such as memory consumption and inference speed. To address this issue, we introduce LD-Pruner, a novel performance-preserving structured pruning method for compressing LDMs. Traditional pruning methods for deep neural networks are not tailored to the unique characteristics of LDMs, such as the high computational cost of training and the absence of a fast, straightforward and task-agnostic method for evaluating model performance. Our method tackles these challenges by leveraging the latent space during the pruning process, enabling us to effectively quantify the impact of pruning on model performance, independently of the task at hand. This targeted pruning of components with minimal impact on the output allows for faster convergence during training, as the model has less information to re-learn, thereby addressing the high computational cost of training. Consequently, our approach achieves a compressed model that offers improved inference speed and reduced parameter count, while maintaining minimal performance degradation. We demonstrate the effectiveness of our approach on three different tasks: text-to-image (T2I) generation, Unconditional Image Generation (UIG) and Unconditional Audio Generation (UAG). Notably, we reduce the inference time of Stable Diffusion (SD) by 34.9% while simultaneously improving its FID by 5.2% on MS-COCO T2I benchmark. This work paves the way for more efficient pruning methods for LDMs, enhancing their applicability.

LD-Pruner: Efficient Pruning of Latent Diffusion Models using Task-Agnostic Insights

TL;DR

LD-Pruner addresses the challenge of deploying Latent Diffusion Models on resource-constrained devices by introducing a task-agnostic, latent-space–guided pruning framework. It collects latent representations under single-operator modifications, defines a task-agnostic operator-importance score as the sum of changes in latent mean and variance, and prunes the least-significant operators while preserving weights to accelerate finetuning, with complexity . The method demonstrates substantial inference-speed gains with minimal quality degradation across text-to-image, unconditional image, and unconditional audio tasks, and offers insights into metric design and the value of weight preservation. This work enables practical, efficient deployment of LDMs across modalities by reducing compute and memory without substantial performance loss.

Abstract

Latent Diffusion Models (LDMs) have emerged as powerful generative models, known for delivering remarkable results under constrained computational resources. However, deploying LDMs on resource-limited devices remains a complex issue, presenting challenges such as memory consumption and inference speed. To address this issue, we introduce LD-Pruner, a novel performance-preserving structured pruning method for compressing LDMs. Traditional pruning methods for deep neural networks are not tailored to the unique characteristics of LDMs, such as the high computational cost of training and the absence of a fast, straightforward and task-agnostic method for evaluating model performance. Our method tackles these challenges by leveraging the latent space during the pruning process, enabling us to effectively quantify the impact of pruning on model performance, independently of the task at hand. This targeted pruning of components with minimal impact on the output allows for faster convergence during training, as the model has less information to re-learn, thereby addressing the high computational cost of training. Consequently, our approach achieves a compressed model that offers improved inference speed and reduced parameter count, while maintaining minimal performance degradation. We demonstrate the effectiveness of our approach on three different tasks: text-to-image (T2I) generation, Unconditional Image Generation (UIG) and Unconditional Audio Generation (UAG). Notably, we reduce the inference time of Stable Diffusion (SD) by 34.9% while simultaneously improving its FID by 5.2% on MS-COCO T2I benchmark. This work paves the way for more efficient pruning methods for LDMs, enhancing their applicability.
Paper Structure (22 sections, 4 equations, 16 figures, 4 tables, 1 algorithm)

This paper contains 22 sections, 4 equations, 16 figures, 4 tables, 1 algorithm.

Figures (16)

  • Figure 1: Samples generated using our compressed models. The proposed compression technique applies structured pruning to LDMs using task-agnostic information. Prompts (left to right): "A multi-colored cat with yellow eyes staring upward", "Candles and flowers neatly placed on a table", "Portrait of a chief indian, 4k, high definition", "A photo of a raccoon wearing an astronaut helmet, looking out of the window at night."
  • Figure 2: Overview of LD-Pruner. Given $k$ operators in the Unet, we generate $k+1$ sets of $N_{gen}$ latent vectors: one set for the original Unet, and one for each Unet where a single operator has been modified. The importance score of each operator is then calculated using a formula specifically designed to compare latent vectors. This formula, sensitive to shifts in both the central tendency and the variability of the latent vectors, generates a comprehensive measure of the importance of each operator.
  • Figure 3: Qualitative comparison on zero-shot MS-COCO benchmark on T2I. The results of previous studies were obtained with their official released models.
  • Figure 4: Type and relative importance of the modified operators in each block of our compressed SD.
  • Figure 5: Evolution of the FID during the training process for the UIG task on the CelebA-HQ $256\times256$ dataset, for two different compression ratios.
  • ...and 11 more figures