Table of Contents
Fetching ...

Overcoming Distribution Mismatch in Quantizing Image Super-Resolution Networks

Cheeun Hong, Kyoung Mu Lee

TL;DR

This work tackles the accuracy drop when quantizing image super-resolution networks by addressing distribution mismatch across channels and inputs. It introduces ODM, a quantization-aware training framework with cooperative mismatch regularization for activations and a layer-wise weight clipping correction, eliminating the need for expensive dynamic test-time adaptation. ODM delivers state-of-the-art SR quantization performance across CNN and Transformer architectures (EDS R, RDN, SwinIR) with minimal computational overhead, outperforming existing methods like PAMS, DAQ, DDTB, and QuantSR. The approach enhances practical deployability of ultra-low-bit SR models on resource-constrained devices while maintaining high restoration quality.

Abstract

Although quantization has emerged as a promising approach to reducing computational complexity across various high-level vision tasks, it inevitably leads to accuracy loss in image super-resolution (SR) networks. This is due to the significantly divergent feature distributions across different channels and input images of the SR networks, which complicates the selection of a fixed quantization range. Existing works address this distribution mismatch problem by dynamically adapting quantization ranges to the varying distributions during test time. However, such a dynamic adaptation incurs additional computational costs during inference. In contrast, we propose a new quantization-aware training scheme that effectively Overcomes the Distribution Mismatch problem in SR networks without the need for dynamic adaptation. Intuitively, this mismatch can be mitigated by regularizing the distance between the feature and a fixed quantization range. However, we observe that such regularization can conflict with the reconstruction loss during training, negatively impacting SR accuracy. Therefore, we opt to regularize the mismatch only when the gradients of the regularization are aligned with those of the reconstruction loss. Additionally, we introduce a layer-wise weight clipping correction scheme to determine a more suitable quantization range for layer-wise weights. Experimental results demonstrate that our framework effectively reduces the distribution mismatch and achieves state-of-the-art performance with minimal computational overhead.

Overcoming Distribution Mismatch in Quantizing Image Super-Resolution Networks

TL;DR

This work tackles the accuracy drop when quantizing image super-resolution networks by addressing distribution mismatch across channels and inputs. It introduces ODM, a quantization-aware training framework with cooperative mismatch regularization for activations and a layer-wise weight clipping correction, eliminating the need for expensive dynamic test-time adaptation. ODM delivers state-of-the-art SR quantization performance across CNN and Transformer architectures (EDS R, RDN, SwinIR) with minimal computational overhead, outperforming existing methods like PAMS, DAQ, DDTB, and QuantSR. The approach enhances practical deployability of ultra-low-bit SR models on resource-constrained devices while maintaining high restoration quality.

Abstract

Although quantization has emerged as a promising approach to reducing computational complexity across various high-level vision tasks, it inevitably leads to accuracy loss in image super-resolution (SR) networks. This is due to the significantly divergent feature distributions across different channels and input images of the SR networks, which complicates the selection of a fixed quantization range. Existing works address this distribution mismatch problem by dynamically adapting quantization ranges to the varying distributions during test time. However, such a dynamic adaptation incurs additional computational costs during inference. In contrast, we propose a new quantization-aware training scheme that effectively Overcomes the Distribution Mismatch problem in SR networks without the need for dynamic adaptation. Intuitively, this mismatch can be mitigated by regularizing the distance between the feature and a fixed quantization range. However, we observe that such regularization can conflict with the reconstruction loss during training, negatively impacting SR accuracy. Therefore, we opt to regularize the mismatch only when the gradients of the regularization are aligned with those of the reconstruction loss. Additionally, we introduce a layer-wise weight clipping correction scheme to determine a more suitable quantization range for layer-wise weights. Experimental results demonstrate that our framework effectively reduces the distribution mismatch and achieves state-of-the-art performance with minimal computational overhead.
Paper Structure (28 sections, 9 equations, 7 figures, 19 tables, 1 algorithm)

This paper contains 28 sections, 9 equations, 7 figures, 19 tables, 1 algorithm.

Figures (7)

  • Figure 1: Distribution mismatch in SR networks. Compared to a classification network (e.g., ResNet-18), an SR network (e.g., EDSR) exhibits significant mismatches within the feature distributions across channel and image dimensions. The large distribution mismatch complicates the selection of an appropriate quantization range. Channels and images from the 2nd layer are randomly selected for visualization. Additional results are available in the supplementary material.
  • Figure 2: Conflict between mismatch regularization and reconstruction loss. Mismatch regularization updates a number of parameters in the contradictory direction to the reconstruction loss, which we refer to as gradient conflict. (b) When the two losses are jointly used, gradient conflict consistently occurs during training, outputting a negative cosine similarity value. (c) We plot the ratio of conflicted gradients during training. Nearly half of the parameters undergo gradient conflict, which indicates that merely combining mismatch regularization with the reconstruction loss can impair SR accuracy. Visualizations are done on EDSR.
  • Figure 3: Layer-wise variation in error from weight quantization. (a) Quantization error (QE) varies across different layers when a fixed global policy (i.e., max) is used to determine the quantization range, particularly for low bits. For some layers, using max does not effectively serve as a proper policy for quantization range selection. (b) Outliers often dominate the quantization range, leading to quantization grids being wasted on low-density areas. (c) At low bits, quantization grids fail to cover high-density regions adequately. Therefore, the quantization range should be adjusted for certain layers.
  • Figure 4: Qualitative results on Urban100 with EDSR, SwinIR, and RDN-based models
  • Figure 5: Distribution of activations before quantization. Using our cooperative mismatch regularization results in distributions more robust to low-bit quantization. 8th conv layer of EDSR-ODM (2-bit) on 'baby' (Set5) are visualized.
  • ...and 2 more figures