Table of Contents
Fetching ...

QuantVSR: Low-Bit Post-Training Quantization for Real-World Video Super-Resolution

Bowen Chai, Zheng Chen, Libo Zhu, Wenbo Li, Yong Guo, Yulun Zhang

TL;DR

QuantVSR tackles the practical deployment barrier of diffusion-based video super-resolution by introducing a low-bit post-training quantization framework. It combines a spatio-temporal complexity aware (STCA) mechanism with a learnable bias alignment (LBA) module to enable a dual-branch quantization layer that preserves FP performance while using 4- to 6-bit quantization. The method allocates layer-specific ranks based on calibration data, jointly refines the FP and low-bit branches, and trains a small bias adaptor to mitigate quantization bias; experiments show near FP performance at 4-bit and significant improvements over existing quantization methods on both synthetic and real-world VSR datasets. This work enables efficient, real-world deployment of diffusion-based VSR models on edge devices and resource-constrained settings, with code available at the provided repository.

Abstract

Diffusion models have shown superior performance in real-world video super-resolution (VSR). However, the slow processing speeds and heavy resource consumption of diffusion models hinder their practical application and deployment. Quantization offers a potential solution for compressing the VSR model. Nevertheless, quantizing VSR models is challenging due to their temporal characteristics and high fidelity requirements. To address these issues, we propose QuantVSR, a low-bit quantization model for real-world VSR. We propose a spatio-temporal complexity aware (STCA) mechanism, where we first utilize the calibration dataset to measure both spatial and temporal complexities for each layer. Based on these statistics, we allocate layer-specific ranks to the low-rank full-precision (FP) auxiliary branch. Subsequently, we jointly refine the FP and low-bit branches to achieve simultaneous optimization. In addition, we propose a learnable bias alignment (LBA) module to reduce the biased quantization errors. Extensive experiments on synthetic and real-world datasets demonstrate that our method obtains comparable performance with the FP model and significantly outperforms recent leading low-bit quantization methods. Code is available at: https://github.com/bowenchai/QuantVSR.

QuantVSR: Low-Bit Post-Training Quantization for Real-World Video Super-Resolution

TL;DR

QuantVSR tackles the practical deployment barrier of diffusion-based video super-resolution by introducing a low-bit post-training quantization framework. It combines a spatio-temporal complexity aware (STCA) mechanism with a learnable bias alignment (LBA) module to enable a dual-branch quantization layer that preserves FP performance while using 4- to 6-bit quantization. The method allocates layer-specific ranks based on calibration data, jointly refines the FP and low-bit branches, and trains a small bias adaptor to mitigate quantization bias; experiments show near FP performance at 4-bit and significant improvements over existing quantization methods on both synthetic and real-world VSR datasets. This work enables efficient, real-world deployment of diffusion-based VSR models on edge devices and resource-constrained settings, with code available at the provided repository.

Abstract

Diffusion models have shown superior performance in real-world video super-resolution (VSR). However, the slow processing speeds and heavy resource consumption of diffusion models hinder their practical application and deployment. Quantization offers a potential solution for compressing the VSR model. Nevertheless, quantizing VSR models is challenging due to their temporal characteristics and high fidelity requirements. To address these issues, we propose QuantVSR, a low-bit quantization model for real-world VSR. We propose a spatio-temporal complexity aware (STCA) mechanism, where we first utilize the calibration dataset to measure both spatial and temporal complexities for each layer. Based on these statistics, we allocate layer-specific ranks to the low-rank full-precision (FP) auxiliary branch. Subsequently, we jointly refine the FP and low-bit branches to achieve simultaneous optimization. In addition, we propose a learnable bias alignment (LBA) module to reduce the biased quantization errors. Extensive experiments on synthetic and real-world datasets demonstrate that our method obtains comparable performance with the FP model and significantly outperforms recent leading low-bit quantization methods. Code is available at: https://github.com/bowenchai/QuantVSR.

Paper Structure

This paper contains 16 sections, 6 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Visual comparison among the full-precision VSR model, SVDQuant li2024svdquant, and our QuantVSR.
  • Figure 2: Performance comparisons on the real-world benchmark (i.e., MVSR4x wang2023benchmark) and compression ratio of our method. Bn represents n-bit quantization. Our QuantVSR compresses the model but retains performance comparable to the FP model, which surpasses existing quantization methods (e.g., ViDiT-Q zhao2024vidit and SVDQuant li2024svdquant).
  • Figure 3: Overview of our QuantVSR. First, we analyze the temporal and spatial complexity distribution of the calibration dataset and leverage these statistics to allocate layer-specific ranks. Next, we jointly refine the two branches in spatio-temporal complexity aware mechanism. Finally, we train the learnable bias alignment module.
  • Figure 4: Visual comparison on synthetic (SPMCS yi2019progressive, REDS4 nah2019ntire) and real-world (MVSR4x wang2023benchmark) datasets at 6 / 4-bit quantization. Our approach outperforms existing methods especially in the 4‑bit setting.
  • Figure 5: Comparison of temporal consistency (stacking the red line across frames).