Table of Contents
Fetching ...

PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution

Libo Zhu, Jianze Li, Haotong Qin, Wenbo Li, Yulun Zhang, Yong Guo, Xiaokang Yang

TL;DR

This work tackles the deployment bottlenecks of diffusion-based one-step image super-resolution by introducing PassionSR, a post-training quantization framework tailored for OSDSR. It simplifies the OSDSR backbone to UNet and VAE, and introduces Learnable Boundary Quantizer (LBQ) and Learnable Equivalent Transformation (LET) to adapt activation and weight distributions for low-bit quantization, complemented by a Distributed Quantization Calibration (DQC) strategy for stable, rapid convergence. Experiments show that PassionSR achieves perceptual parity with full-precision models at 8-bit and maintains strong performance at 6-bit, with substantial compression (approximately 80-85% parameter and 76-82% operation reductions) compared to the baseline diffusion model. The approach offers a practical path to hardware-friendly, high-quality OSDSR, with broad implications for real-time and offline SR applications.

Abstract

Diffusion-based image super-resolution (SR) models have shown superior performance at the cost of multiple denoising steps. However, even though the denoising step has been reduced to one, they require high computational costs and storage requirements, making it difficult for deployment on hardware devices. To address these issues, we propose a novel post-training quantization approach with adaptive scale in one-step diffusion (OSD) image SR, PassionSR. First, we simplify OSD model to two core components, UNet and Variational Autoencoder (VAE) by removing the CLIPEncoder. Secondly, we propose Learnable Boundary Quantizer (LBQ) and Learnable Equivalent Transformation (LET) to optimize the quantization process and manipulate activation distributions for better quantization. Finally, we design a Distributed Quantization Calibration (DQC) strategy that stabilizes the training of quantized parameters for rapid convergence. Comprehensive experiments demonstrate that PassionSR with 8-bit and 6-bit obtains comparable visual results with full-precision model. Moreover, our PassionSR achieves significant advantages over recent leading low-bit quantization methods for image SR. Our code will be at https://github.com/libozhu03/PassionSR.

PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution

TL;DR

This work tackles the deployment bottlenecks of diffusion-based one-step image super-resolution by introducing PassionSR, a post-training quantization framework tailored for OSDSR. It simplifies the OSDSR backbone to UNet and VAE, and introduces Learnable Boundary Quantizer (LBQ) and Learnable Equivalent Transformation (LET) to adapt activation and weight distributions for low-bit quantization, complemented by a Distributed Quantization Calibration (DQC) strategy for stable, rapid convergence. Experiments show that PassionSR achieves perceptual parity with full-precision models at 8-bit and maintains strong performance at 6-bit, with substantial compression (approximately 80-85% parameter and 76-82% operation reductions) compared to the baseline diffusion model. The approach offers a practical path to hardware-friendly, high-quality OSDSR, with broad implications for real-time and offline SR applications.

Abstract

Diffusion-based image super-resolution (SR) models have shown superior performance at the cost of multiple denoising steps. However, even though the denoising step has been reduced to one, they require high computational costs and storage requirements, making it difficult for deployment on hardware devices. To address these issues, we propose a novel post-training quantization approach with adaptive scale in one-step diffusion (OSD) image SR, PassionSR. First, we simplify OSD model to two core components, UNet and Variational Autoencoder (VAE) by removing the CLIPEncoder. Secondly, we propose Learnable Boundary Quantizer (LBQ) and Learnable Equivalent Transformation (LET) to optimize the quantization process and manipulate activation distributions for better quantization. Finally, we design a Distributed Quantization Calibration (DQC) strategy that stabilizes the training of quantized parameters for rapid convergence. Comprehensive experiments demonstrate that PassionSR with 8-bit and 6-bit obtains comparable visual results with full-precision model. Moreover, our PassionSR achieves significant advantages over recent leading low-bit quantization methods for image SR. Our code will be at https://github.com/libozhu03/PassionSR.

Paper Structure

This paper contains 20 sections, 8 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Visual comparison ($\times$4) between full-precision (FP) multi-step and one-step diffusion SR models and our 8-bit quantized PassionSR. Compared to FP models, PassionSR achieves about 81.77% params reduction and 4$\times$ speedup.
  • Figure 2: Visual comparison ($\times$4) of one-step diffusion SR models. We use OSEDiff as a 32-bit full-precision (FP) reference and provide 6-bit quantized version with different methods.
  • Figure 3: Diffusion-based image SR acceleration.
  • Figure 4: Overview of our PassionSR. Step 1: we simplify OSEDiff OSEDiff by removing DAPE and CLIP Encoder, obtaining PassionSR-FP. Step 2: the quantizer we use has two key trainable parts, consisting of the Learnable Boundary Quantizer and Learnable Equivalent Transformation. Step 3: we design a distributed calibration strategy and special loss function to accelerate convergence of calibration.
  • Figure 5: Loss comparison between w/ and w/o DQC
  • ...and 4 more figures