Table of Contents
Fetching ...

BiMaCoSR: Binary One-Step Diffusion Model Leveraging Flexible Matrix Compression for Real Super-Resolution

Kai Liu, Kaicheng Yang, Zheng Chen, Zhiteng Li, Yong Guo, Wenbo Li, Linghe Kong, Yulun Zhang

TL;DR

BiMaCoSR tackles Real SR by compressing diffusion models through 1-bit binarization and one-step distillation. The approach introduces LRMB and SMB to capture low-frequency and high-rank information while preserving FP priors, with SVD-based initialization to leverage pretrained weights. Empirical results on RealSR, DRealSR, and DIV2K-Val show large compression ($\approx$ $23.8\times$) and speedups ($\approx$ $27.4\times$) with competitive restoration quality, validated by extensive ablations and visualizations. The work enables practical deployment of diffusion-based SR on edge devices and provides a blueprint for further matrix-compression strategies in generative vision models.

Abstract

While super-resolution (SR) methods based on diffusion models (DM) have demonstrated inspiring performance, their deployment is impeded due to the heavy request of memory and computation. Recent researchers apply two kinds of methods to compress or fasten the DM. One is to compress the DM into 1-bit, aka binarization, alleviating the storage and computation pressure. The other distills the multi-step DM into only one step, significantly speeding up inference process. Nonetheless, it remains impossible to deploy DM to resource-limited edge devices. To address this problem, we propose BiMaCoSR, which combines binarization and one-step distillation to obtain extreme compression and acceleration. To prevent the catastrophic collapse of the model caused by binarization, we proposed sparse matrix branch (SMB) and low rank matrix branch (LRMB). Both auxiliary branches pass the full-precision (FP) information but in different ways. SMB absorbs the extreme values and its output is high rank, carrying abundant FP information. Whereas, the design of LRMB is inspired by LoRA and is initialized with the top r SVD components, outputting low rank representation. The computation and storage overhead of our proposed branches can be safely ignored. Comprehensive comparison experiments are conducted to exhibit BiMaCoSR outperforms current state-of-the-art binarization methods and gains competitive performance compared with FP one-step model. BiMaCoSR achieves a 23.8x compression ratio and a 27.4x speedup ratio compared to FP counterpart. Our code and model are available at https://github.com/Kai-Liu001/BiMaCoSR.

BiMaCoSR: Binary One-Step Diffusion Model Leveraging Flexible Matrix Compression for Real Super-Resolution

TL;DR

BiMaCoSR tackles Real SR by compressing diffusion models through 1-bit binarization and one-step distillation. The approach introduces LRMB and SMB to capture low-frequency and high-rank information while preserving FP priors, with SVD-based initialization to leverage pretrained weights. Empirical results on RealSR, DRealSR, and DIV2K-Val show large compression ( ) and speedups ( ) with competitive restoration quality, validated by extensive ablations and visualizations. The work enables practical deployment of diffusion-based SR on edge devices and provides a blueprint for further matrix-compression strategies in generative vision models.

Abstract

While super-resolution (SR) methods based on diffusion models (DM) have demonstrated inspiring performance, their deployment is impeded due to the heavy request of memory and computation. Recent researchers apply two kinds of methods to compress or fasten the DM. One is to compress the DM into 1-bit, aka binarization, alleviating the storage and computation pressure. The other distills the multi-step DM into only one step, significantly speeding up inference process. Nonetheless, it remains impossible to deploy DM to resource-limited edge devices. To address this problem, we propose BiMaCoSR, which combines binarization and one-step distillation to obtain extreme compression and acceleration. To prevent the catastrophic collapse of the model caused by binarization, we proposed sparse matrix branch (SMB) and low rank matrix branch (LRMB). Both auxiliary branches pass the full-precision (FP) information but in different ways. SMB absorbs the extreme values and its output is high rank, carrying abundant FP information. Whereas, the design of LRMB is inspired by LoRA and is initialized with the top r SVD components, outputting low rank representation. The computation and storage overhead of our proposed branches can be safely ignored. Comprehensive comparison experiments are conducted to exhibit BiMaCoSR outperforms current state-of-the-art binarization methods and gains competitive performance compared with FP one-step model. BiMaCoSR achieves a 23.8x compression ratio and a 27.4x speedup ratio compared to FP counterpart. Our code and model are available at https://github.com/Kai-Liu001/BiMaCoSR.

Paper Structure

This paper contains 17 sections, 12 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Performance comparison between binarization methods on the RealSR dataset. BiMaCoSR achieves consistently leading scores on all evaluation metrics.
  • Figure 2: Overview of our proposed BiMaCoSR which employs three different compressed matrix branches. (a) The structure of a convolution layer in BiMaCoSR after binarization. Two auxiliary branches, i.e., LRMB and SMB, support BiMaCoSR's excellent performance. The linear layer can be regarded as 1$\times$1 convolution layer and is processed with the same pipeline. (b) Illustration of the initialization sequence and how the three branches solve the weakness of other branch.
  • Figure 3: Initialization of different branch.$W_{res}$ represents the initial quantization error. In our method, $\| W_{res} \|_F^2 = 0.1855$, while $\| W_{res} \|_F^2 =1.1275$ in direct binarization.
  • Figure 4: Visual comparison for image SR. We compare our proposed BiMaCoSR with current competitive binarization methods and the full-precision (FP) model. The visual results illustrate that BiMaCoSR gains rich details and reasonable textures.
  • Figure 5: The proportion of high frequency information generated by three branches. The high frequency information mainly comes from BMB, which obeys our assumption.