Table of Contents
Fetching ...

BinaryDM: Accurate Weight Binarization for Efficient Diffusion Models

Xingyu Zheng, Xianglong Liu, Haotong Qin, Xudong Ma, Mingyuan Zhang, Haojie Hao, Jiakai Wang, Zixiang Zhao, Jinyang Guo, Michele Magno

TL;DR

This work addresses the challenge of deploying diffusion models with ultra-low-bit weights by introducing BinaryDM, which combines an Evolvable-Basis Binarizer (EBB) to expand early representation capacity and a Low-rank Representation Mimicking (LRM) to stabilize optimization. EBB uses a two-stage, learnable multi-basis representation that gradually collapses to a single binarized basis, applied to critical layers to avoid excessive overhead. LRM projects intermediate representations into a fixed low-rank space via PCA-based projection to align full-precision and binarized networks with a distillation-like loss, improving convergence and robustness. Together, BinaryDM achieves substantial accuracy and efficiency gains over state-of-the-art quantization methods, enabling 1-bit weights with 4-bit activations (W1A4) and up to 29.2x model-size and 15.2x OP savings, demonstrating strong potential for edge deployment of diffusion-based generation.

Abstract

With the advancement of diffusion models (DMs) and the substantially increased computational requirements, quantization emerges as a practical solution to obtain compact and efficient low-bit DMs. However, the highly discrete representation leads to severe accuracy degradation, hindering the quantization of diffusion models to ultra-low bit-widths. This paper proposes a novel weight binarization approach for DMs, namely BinaryDM, pushing binarized DMs to be accurate and efficient by improving the representation and optimization. From the representation perspective, we present an Evolvable-Basis Binarizer (EBB) to enable a smooth evolution of DMs from full-precision to accurately binarized. EBB enhances information representation in the initial stage through the flexible combination of multiple binary bases and applies regularization to evolve into efficient single-basis binarization. The evolution only occurs in the head and tail of the DM architecture to retain the stability of training. From the optimization perspective, a Low-rank Representation Mimicking (LRM) is applied to assist the optimization of binarized DMs. The LRM mimics the representations of full-precision DMs in low-rank space, alleviating the direction ambiguity of the optimization process caused by fine-grained alignment. Comprehensive experiments demonstrate that BinaryDM achieves significant accuracy and efficiency gains compared to SOTA quantization methods of DMs under ultra-low bit-widths. With 1-bit weight and 4-bit activation (W1A4), BinaryDM achieves as low as 7.74 FID and saves the performance from collapse (baseline FID 10.87). As the first binarization method for diffusion models, W1A4 BinaryDM achieves impressive 15.2x OPs and 29.2x model size savings, showcasing its substantial potential for edge deployment. The code is available at https://github.com/Xingyu-Zheng/BinaryDM.

BinaryDM: Accurate Weight Binarization for Efficient Diffusion Models

TL;DR

This work addresses the challenge of deploying diffusion models with ultra-low-bit weights by introducing BinaryDM, which combines an Evolvable-Basis Binarizer (EBB) to expand early representation capacity and a Low-rank Representation Mimicking (LRM) to stabilize optimization. EBB uses a two-stage, learnable multi-basis representation that gradually collapses to a single binarized basis, applied to critical layers to avoid excessive overhead. LRM projects intermediate representations into a fixed low-rank space via PCA-based projection to align full-precision and binarized networks with a distillation-like loss, improving convergence and robustness. Together, BinaryDM achieves substantial accuracy and efficiency gains over state-of-the-art quantization methods, enabling 1-bit weights with 4-bit activations (W1A4) and up to 29.2x model-size and 15.2x OP savings, demonstrating strong potential for edge deployment of diffusion-based generation.

Abstract

With the advancement of diffusion models (DMs) and the substantially increased computational requirements, quantization emerges as a practical solution to obtain compact and efficient low-bit DMs. However, the highly discrete representation leads to severe accuracy degradation, hindering the quantization of diffusion models to ultra-low bit-widths. This paper proposes a novel weight binarization approach for DMs, namely BinaryDM, pushing binarized DMs to be accurate and efficient by improving the representation and optimization. From the representation perspective, we present an Evolvable-Basis Binarizer (EBB) to enable a smooth evolution of DMs from full-precision to accurately binarized. EBB enhances information representation in the initial stage through the flexible combination of multiple binary bases and applies regularization to evolve into efficient single-basis binarization. The evolution only occurs in the head and tail of the DM architecture to retain the stability of training. From the optimization perspective, a Low-rank Representation Mimicking (LRM) is applied to assist the optimization of binarized DMs. The LRM mimics the representations of full-precision DMs in low-rank space, alleviating the direction ambiguity of the optimization process caused by fine-grained alignment. Comprehensive experiments demonstrate that BinaryDM achieves significant accuracy and efficiency gains compared to SOTA quantization methods of DMs under ultra-low bit-widths. With 1-bit weight and 4-bit activation (W1A4), BinaryDM achieves as low as 7.74 FID and saves the performance from collapse (baseline FID 10.87). As the first binarization method for diffusion models, W1A4 BinaryDM achieves impressive 15.2x OPs and 29.2x model size savings, showcasing its substantial potential for edge deployment. The code is available at https://github.com/Xingyu-Zheng/BinaryDM.
Paper Structure (16 sections, 19 equations, 9 figures, 17 tables)

This paper contains 16 sections, 19 equations, 9 figures, 17 tables.

Figures (9)

  • Figure 1: Overview of BinaryDM, consisting of Evolvable-Basis Binarizer to enhance information representation and Low-rank Representation Mimicking to improve optimization direction.
  • Figure 2: Comparison of binarized weights(channel-wise) for a convolutional layer. EBB possesses a broader representation range at the early stage and then gradually transitions to a single-basis state, while the quantitative information entropy $\mathcal{H}$ further illustrates its enhanced representation capacity.
  • Figure 3: The impact of different distillation loss functions on the output features of each block in both full-precision DM and binary DM, measured by the $\mathcal{L}_\text{2}$ distance. Our proposed LRM enables the binarized DM to have the best information-mimicking capability.
  • Figure 4: Visualization of samples generated by the binarized DM baseline and W1A4 BinaryDM.
  • Figure 5: A comprehensive record of the impact of different distillation loss functions on the output features of each block in both full-precision DM and binarized DM, measured using the $\mathcal{L}_\text{2}$ distance.
  • ...and 4 more figures