Table of Contents
Fetching ...

QArtSR: Quantization via Reverse-Module and Timestep-Retraining in One-Step Diffusion based Image Super-Resolution

Libo Zhu, Haotong Qin, Kaicheng Yang, Wenbo Li, Yong Guo, Yulun Zhang, Susanto Rahardja, Xiaokang Yang

TL;DR

QArtSR addresses the challenge of ultra-low-bit quantization for one-step diffusion-based image super-resolution by introducing TRQ and RPQ, coupled with a specialized finetuning quantizer and extended end-to-end training. TRQ selects and retrains at an optimal timestep to minimize quantization error, while RPQ reverses the quantization order to align module- and image-level losses, with ET ensuring full finetuning of all quantized modules. Empirically, 4-bit QArtSR approaches full-precision performance and 2-bit quantization remains robust across datasets, achieving 90–95% reductions in parameters and operations compared to the FP backbone and outperforming competing diffusion-quantization methods. The work enables practical deployment of high-quality OSDSR on resource-constrained devices and advances the state of ultra-low-bit diffusion model quantization.

Abstract

One-step diffusion-based image super-resolution (OSDSR) models are showing increasingly superior performance nowadays. However, although their denoising steps are reduced to one and they can be quantized to 8-bit to reduce the costs further, there is still significant potential for OSDSR to quantize to lower bits. To explore more possibilities of quantized OSDSR, we propose an efficient method, Quantization via reverse-module and timestep-retraining for OSDSR, named QArtSR. Firstly, we investigate the influence of timestep value on the performance of quantized models. Then, we propose Timestep Retraining Quantization (TRQ) and Reversed Per-module Quantization (RPQ) strategies to calibrate the quantized model. Meanwhile, we adopt the module and image losses to update all quantized modules. We only update the parameters in quantization finetuning components, excluding the original weights. To ensure that all modules are fully finetuned, we add extended end-to-end training after per-module stage. Our 4-bit and 2-bit quantization experimental results indicate that QArtSR obtains superior effects against the recent leading comparison methods. The performance of 4-bit QArtSR is close to the full-precision one. Our code will be released at https://github.com/libozhu03/QArtSR.

QArtSR: Quantization via Reverse-Module and Timestep-Retraining in One-Step Diffusion based Image Super-Resolution

TL;DR

QArtSR addresses the challenge of ultra-low-bit quantization for one-step diffusion-based image super-resolution by introducing TRQ and RPQ, coupled with a specialized finetuning quantizer and extended end-to-end training. TRQ selects and retrains at an optimal timestep to minimize quantization error, while RPQ reverses the quantization order to align module- and image-level losses, with ET ensuring full finetuning of all quantized modules. Empirically, 4-bit QArtSR approaches full-precision performance and 2-bit quantization remains robust across datasets, achieving 90–95% reductions in parameters and operations compared to the FP backbone and outperforming competing diffusion-quantization methods. The work enables practical deployment of high-quality OSDSR on resource-constrained devices and advances the state of ultra-low-bit diffusion model quantization.

Abstract

One-step diffusion-based image super-resolution (OSDSR) models are showing increasingly superior performance nowadays. However, although their denoising steps are reduced to one and they can be quantized to 8-bit to reduce the costs further, there is still significant potential for OSDSR to quantize to lower bits. To explore more possibilities of quantized OSDSR, we propose an efficient method, Quantization via reverse-module and timestep-retraining for OSDSR, named QArtSR. Firstly, we investigate the influence of timestep value on the performance of quantized models. Then, we propose Timestep Retraining Quantization (TRQ) and Reversed Per-module Quantization (RPQ) strategies to calibrate the quantized model. Meanwhile, we adopt the module and image losses to update all quantized modules. We only update the parameters in quantization finetuning components, excluding the original weights. To ensure that all modules are fully finetuned, we add extended end-to-end training after per-module stage. Our 4-bit and 2-bit quantization experimental results indicate that QArtSR obtains superior effects against the recent leading comparison methods. The performance of 4-bit QArtSR is close to the full-precision one. Our code will be released at https://github.com/libozhu03/QArtSR.

Paper Structure

This paper contains 16 sections, 9 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Visual results of full-precision (FP) and low-bit multi-step and/or one-step diffusion SR models. Ops are computed with output size 512$\times$512. Compared to FP OSEDiff, QArtSR achieves about 90.66% params reduction and 8$\times$ speedup.
  • Figure 2: Performance visualization of low-bit quantization OSDSR methods at W4A4 bits setting on Urban100 Huang-CVPR-2015.
  • Figure 3: Visual comparison ($\times$4) of 32-bit OSEDiff wu2024one and 2-bit quantized models with various quantization methods.
  • Figure 4: Overview of our QArtSR. Stage 1: we research the relationship between timestep and quantization error. We retrain the OSDSR with the best timestep $T$ before quantization. Stage 2: we propose a reversed per-module quantization strategy to make the process of quantization finetuning more smooth. Stage 3: we need to carry on the extended end-to-end training to enhance the performance further.
  • Figure 5: The value of $\alpha$ and $\lambda$ of different timestep $T$.
  • ...and 2 more figures