Table of Contents
Fetching ...

Outlier-Aware Post-Training Quantization for Image Super-Resolution

Hailing Wang, jianglin Lu, Yitian Zhang, Yun Fu

TL;DR

The paper addresses the gap in post-training quantization for image super-resolution by identifying activation outliers as color-related, and introduces an outlier-aware dual-region quantization with a breakpoint $bp$ to balance the representation of dense activations and outliers. It further proposes sensitivity-aware finetuning to allocate focus to highly quantization-sensitive layers, optimizing a combined loss using calibration data only. Across SR models and datasets, the method outperforms existing PTQ approaches and achieves performance comparable to QAT, while delivering substantial speedups and no reliance on ground-truth high-resolution images. This approach enhances deployment efficiency of SR models on edge devices by preserving color fidelity and improving quantization effectiveness. The practical impact lies in enabling accurate, fast SR inference under low-bit precision without costly retraining or data requirements.

Abstract

Quantization techniques, including quantization-aware training (QAT) and post-training quantization (PTQ), have become essential for inference acceleration of image super-resolution (SR) networks. Compared to QAT, PTQ has garnered significant attention as it eliminates the need for ground truth and model retraining. However, existing PTQ methods for SR often fail to achieve satisfactory performance as they overlook the impact of outliers in activation. Our empirical analysis reveals that these prevalent activation outliers are strongly correlated with image color information, and directly removing them leads to significant performance degradation. Motivated by this, we propose a dual-region quantization strategy that partitions activations into an outlier region and a dense region, applying uniform quantization to each region independently to better balance bit-width allocation. Furthermore, we observe that different network layers exhibit varying sensitivities to quantization, leading to different levels of performance degradation. To address this, we introduce sensitivity-aware finetuning that encourages the model to focus more on highly sensitive layers, further enhancing quantization performance. Extensive experiments demonstrate that our method outperforms existing PTQ approaches across various SR networks and datasets, while achieving performance comparable to QAT methods in most scenarios with at least a 75 speedup.

Outlier-Aware Post-Training Quantization for Image Super-Resolution

TL;DR

The paper addresses the gap in post-training quantization for image super-resolution by identifying activation outliers as color-related, and introduces an outlier-aware dual-region quantization with a breakpoint to balance the representation of dense activations and outliers. It further proposes sensitivity-aware finetuning to allocate focus to highly quantization-sensitive layers, optimizing a combined loss using calibration data only. Across SR models and datasets, the method outperforms existing PTQ approaches and achieves performance comparable to QAT, while delivering substantial speedups and no reliance on ground-truth high-resolution images. This approach enhances deployment efficiency of SR models on edge devices by preserving color fidelity and improving quantization effectiveness. The practical impact lies in enabling accurate, fast SR inference under low-bit precision without costly retraining or data requirements.

Abstract

Quantization techniques, including quantization-aware training (QAT) and post-training quantization (PTQ), have become essential for inference acceleration of image super-resolution (SR) networks. Compared to QAT, PTQ has garnered significant attention as it eliminates the need for ground truth and model retraining. However, existing PTQ methods for SR often fail to achieve satisfactory performance as they overlook the impact of outliers in activation. Our empirical analysis reveals that these prevalent activation outliers are strongly correlated with image color information, and directly removing them leads to significant performance degradation. Motivated by this, we propose a dual-region quantization strategy that partitions activations into an outlier region and a dense region, applying uniform quantization to each region independently to better balance bit-width allocation. Furthermore, we observe that different network layers exhibit varying sensitivities to quantization, leading to different levels of performance degradation. To address this, we introduce sensitivity-aware finetuning that encourages the model to focus more on highly sensitive layers, further enhancing quantization performance. Extensive experiments demonstrate that our method outperforms existing PTQ approaches across various SR networks and datasets, while achieving performance comparable to QAT methods in most scenarios with at least a 75 speedup.

Paper Structure

This paper contains 17 sections, 5 equations, 5 figures, 5 tables, 1 algorithm.

Figures (5)

  • Figure 1: Comparison of our method with SOTA PTQ and QAT baselines. In (b), GT denotes ground truth, the bubble size indicates the amount of training data required, and performance is averaged across four datasets.
  • Figure 2: After clipping $1\%$ of activation outliers in the full-precision model, the outputs (bottom) exhibit noticeable color distortions compared to the original ones (top), affecting both global regions and detail-rich local areas of the images.
  • Figure 3: The activation distributions of three different samples at the same layer (body.15.conv1) in EDSR exhibit variations in range (Range) and skewness (Skew). These distributions are divided by a breakpoint $bp$ into a dense region $[-bp, bp]$ and an outlier region $[l\_a, -bp) \cup (bp, u\_a]$, both of which undergo uniform quantization to the corresponding quantization points $qp$.
  • Figure 4: Performance comparison of 4-bit quantization applied individually to each layer of EDSR and SRResNet. Certain layers show a notable drop compared to the upper bound (full-precision performance), indicating higher sensitivity to quantization.
  • Figure 5: Visual comparison between different PTQ methods using RDN network under W4A4 setting. While baseline approaches suffer from different artifacts, our method effectively preserves the fine details across various scenarios.