Table of Contents
Fetching ...

Diffusion Product Quantization

Jie Shao, Hanxiao Zhang, Jianxin Wu

TL;DR

This work proposes a method to compress the codebook by evaluating the importance of each vector and removing redundancy, ensuring the model size remaining within the desired range, and applies this method to the DiT model on ImageNet, demonstrating competitive generative performance.

Abstract

In this work, we explore the quantization of diffusion models in extreme compression regimes to reduce model size while maintaining performance. We begin by investigating classical vector quantization but find that diffusion models are particularly susceptible to quantization error, with the codebook size limiting generation quality. To address this, we introduce product quantization, which offers improved reconstruction precision and larger capacity -- crucial for preserving the generative capabilities of diffusion models. Furthermore, we propose a method to compress the codebook by evaluating the importance of each vector and removing redundancy, ensuring the model size remaining within the desired range. We also introduce an end-to-end calibration approach that adjusts assignments during the forward pass and optimizes the codebook using the DDPM loss. By compressing the model to as low as 1 bit (resulting in over 24 times reduction in model size), we achieve a balance between compression and quality. We apply our compression method to the DiT model on ImageNet and consistently outperform other quantization approaches, demonstrating competitive generative performance.

Diffusion Product Quantization

TL;DR

This work proposes a method to compress the codebook by evaluating the importance of each vector and removing redundancy, ensuring the model size remaining within the desired range, and applies this method to the DiT model on ImageNet, demonstrating competitive generative performance.

Abstract

In this work, we explore the quantization of diffusion models in extreme compression regimes to reduce model size while maintaining performance. We begin by investigating classical vector quantization but find that diffusion models are particularly susceptible to quantization error, with the codebook size limiting generation quality. To address this, we introduce product quantization, which offers improved reconstruction precision and larger capacity -- crucial for preserving the generative capabilities of diffusion models. Furthermore, we propose a method to compress the codebook by evaluating the importance of each vector and removing redundancy, ensuring the model size remaining within the desired range. We also introduce an end-to-end calibration approach that adjusts assignments during the forward pass and optimizes the codebook using the DDPM loss. By compressing the model to as low as 1 bit (resulting in over 24 times reduction in model size), we achieve a balance between compression and quality. We apply our compression method to the DiT model on ImageNet and consistently outperform other quantization approaches, demonstrating competitive generative performance.

Paper Structure

This paper contains 14 sections, 7 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Compression results at low bit-widths, using the DiT-XL/2 model with 250 sampling steps and CFG of 1.5.
  • Figure 2: Results of VQ compression at 2 bits and 1 bit. The VQ-compressed model shows degradation and noticeable distortions, which become more severe as the bit rate decreases. In contrast, our DPQ method largely preserves the model's capacity (left). Additionally, VQ causes quantization errors in each block's output to accumulate over diffusion steps, whereas our DPQ method effectively limits quantization error within a controlled range (right).
  • Figure 3: Product Quantization
  • Figure 4: Our method, DPQ, compresses all learnable parameters to extremely low-bits. In stage 1, we quantize each group of weights using an integer that represents a specific centroid vector in the group’s individual codebook. These codebooks are further compressed by a codebook pool, ensuring efficient model size reduction. In stage 2, we calibrate all compressed parameters by adjusting the codewords based on the reconstruction loss of the output features in the forward pass, and by fine-tuning the codebooks in the backward pass.
  • Figure 5: Visualization of generation results from the DPQ-compressed model.