QArtSR: Quantization via Reverse-Module and Timestep-Retraining in One-Step Diffusion based Image Super-Resolution
Libo Zhu, Haotong Qin, Kaicheng Yang, Wenbo Li, Yong Guo, Yulun Zhang, Susanto Rahardja, Xiaokang Yang
TL;DR
QArtSR addresses the challenge of ultra-low-bit quantization for one-step diffusion-based image super-resolution by introducing TRQ and RPQ, coupled with a specialized finetuning quantizer and extended end-to-end training. TRQ selects and retrains at an optimal timestep to minimize quantization error, while RPQ reverses the quantization order to align module- and image-level losses, with ET ensuring full finetuning of all quantized modules. Empirically, 4-bit QArtSR approaches full-precision performance and 2-bit quantization remains robust across datasets, achieving 90–95% reductions in parameters and operations compared to the FP backbone and outperforming competing diffusion-quantization methods. The work enables practical deployment of high-quality OSDSR on resource-constrained devices and advances the state of ultra-low-bit diffusion model quantization.
Abstract
One-step diffusion-based image super-resolution (OSDSR) models are showing increasingly superior performance nowadays. However, although their denoising steps are reduced to one and they can be quantized to 8-bit to reduce the costs further, there is still significant potential for OSDSR to quantize to lower bits. To explore more possibilities of quantized OSDSR, we propose an efficient method, Quantization via reverse-module and timestep-retraining for OSDSR, named QArtSR. Firstly, we investigate the influence of timestep value on the performance of quantized models. Then, we propose Timestep Retraining Quantization (TRQ) and Reversed Per-module Quantization (RPQ) strategies to calibrate the quantized model. Meanwhile, we adopt the module and image losses to update all quantized modules. We only update the parameters in quantization finetuning components, excluding the original weights. To ensure that all modules are fully finetuned, we add extended end-to-end training after per-module stage. Our 4-bit and 2-bit quantization experimental results indicate that QArtSR obtains superior effects against the recent leading comparison methods. The performance of 4-bit QArtSR is close to the full-precision one. Our code will be released at https://github.com/libozhu03/QArtSR.
