Table of Contents
Fetching ...

BiDM: Pushing the Limit of Quantization for Diffusion Models

Xingyu Zheng, Xianglong Liu, Yichen Bian, Xudong Ma, Yulun Zhang, Jiakai Wang, Jinyang Guo, Haotong Qin

TL;DR

BiDM tackles the challenge of fully binarizing diffusion models to maximize compression and speed. It introduces two core innovations—Timestep-friendly Binary Structure (TBS) to adapt activations across timesteps, and Space Patched Distillation (SPD) to guide binary models via patch-wise distillation from a full-precision teacher. Empirically, BiDM substantially outperforms prior fully binarized baselines across pixel-space and latent-space diffusion models, achieving an FID of 22.74 on LSUN-Bedrooms and enabling up to 28x storage and 52.7x OPs savings, while preserving viewable image quality. The work demonstrates that extreme 1-bit diffusion models are practical, offering a path toward efficient generative modeling in resource-constrained settings.

Abstract

Diffusion models (DMs) have been significantly developed and widely used in various applications due to their excellent generative qualities. However, the expensive computation and massive parameters of DMs hinder their practical use in resource-constrained scenarios. As one of the effective compression approaches, quantization allows DMs to achieve storage saving and inference acceleration by reducing bit-width while maintaining generation performance. However, as the most extreme quantization form, 1-bit binarization causes the generation performance of DMs to face severe degradation or even collapse. This paper proposes a novel method, namely BiDM, for fully binarizing weights and activations of DMs, pushing quantization to the 1-bit limit. From a temporal perspective, we introduce the Timestep-friendly Binary Structure (TBS), which uses learnable activation binarizers and cross-timestep feature connections to address the highly timestep-correlated activation features of DMs. From a spatial perspective, we propose Space Patched Distillation (SPD) to address the difficulty of matching binary features during distillation, focusing on the spatial locality of image generation tasks and noise estimation networks. As the first work to fully binarize DMs, the W1A1 BiDM on the LDM-4 model for LSUN-Bedrooms 256$\times$256 achieves a remarkable FID of 22.74, significantly outperforming the current state-of-the-art general binarization methods with an FID of 59.44 and invalid generative samples, and achieves up to excellent 28.0 times storage and 52.7 times OPs savings. The code is available at https://github.com/Xingyu-Zheng/BiDM .

BiDM: Pushing the Limit of Quantization for Diffusion Models

TL;DR

BiDM tackles the challenge of fully binarizing diffusion models to maximize compression and speed. It introduces two core innovations—Timestep-friendly Binary Structure (TBS) to adapt activations across timesteps, and Space Patched Distillation (SPD) to guide binary models via patch-wise distillation from a full-precision teacher. Empirically, BiDM substantially outperforms prior fully binarized baselines across pixel-space and latent-space diffusion models, achieving an FID of 22.74 on LSUN-Bedrooms and enabling up to 28x storage and 52.7x OPs savings, while preserving viewable image quality. The work demonstrates that extreme 1-bit diffusion models are practical, offering a path toward efficient generative modeling in resource-constrained settings.

Abstract

Diffusion models (DMs) have been significantly developed and widely used in various applications due to their excellent generative qualities. However, the expensive computation and massive parameters of DMs hinder their practical use in resource-constrained scenarios. As one of the effective compression approaches, quantization allows DMs to achieve storage saving and inference acceleration by reducing bit-width while maintaining generation performance. However, as the most extreme quantization form, 1-bit binarization causes the generation performance of DMs to face severe degradation or even collapse. This paper proposes a novel method, namely BiDM, for fully binarizing weights and activations of DMs, pushing quantization to the 1-bit limit. From a temporal perspective, we introduce the Timestep-friendly Binary Structure (TBS), which uses learnable activation binarizers and cross-timestep feature connections to address the highly timestep-correlated activation features of DMs. From a spatial perspective, we propose Space Patched Distillation (SPD) to address the difficulty of matching binary features during distillation, focusing on the spatial locality of image generation tasks and noise estimation networks. As the first work to fully binarize DMs, the W1A1 BiDM on the LDM-4 model for LSUN-Bedrooms 256256 achieves a remarkable FID of 22.74, significantly outperforming the current state-of-the-art general binarization methods with an FID of 59.44 and invalid generative samples, and achieves up to excellent 28.0 times storage and 52.7 times OPs savings. The code is available at https://github.com/Xingyu-Zheng/BiDM .

Paper Structure

This paper contains 20 sections, 16 equations, 9 figures, 9 tables.

Figures (9)

  • Figure 1: Overview of BiDM with Timestep-friendly Binary Structure, which improves DM architecture temporally, and Space Patched Distillation, which enhances DM optimization spatially.
  • Figure 2: (a) The activation range of the 4th convolutional layer of the full-precision DDIM model on CIFAR-10 varies with the denoising timesteps. (b) The output features are similar at each step of the full-precision LDM-4 model on LSUN-Bedrooms compared to the previous step.
  • Figure 3: An illustration of TBS. Since the feature space is high-dimensional, we illustrate it using schematic diagrams.
  • Figure 4: Visualization of the last TimeStepBlock's output of the LDM model on LSUN-bedroom dataset. FP32 denotes the full-precision model's output $\mathcal{F}^{\text{fp}}$. Diff denotes the difference between the output of the full-precision model and the binarized one $\left \| \mathcal{F}^{\text{fp}}-\mathcal{F}^{\text{bi}}\right \|$. Ours denotes the attention-guided SPD.
  • Figure 5: Visualization of samples generated by the W1A1 baseline and our BiDM. BiDM is the first fully binarized DM method capable of generating viewable images, significantly surpassing advanced binarization methods.
  • ...and 4 more figures