Table of Contents
Fetching ...

HarmoQ: Harmonized Post-Training Quantization for High-Fidelity Image

Hongjun Wang, Jiyuan Chen, Xuan Song, Yinqiang Zheng

TL;DR

HarmoQ targets efficient, high-fidelity image super-resolution under post-training quantization by uncovering a fundamental asymmetry: weight quantization mainly degrades structural similarity while activation quantization harms pixel-level accuracy. It introduces a unified three-step framework—Structural Residual Calibration, Harmonized Scale Optimization, and Adaptive Boundary Refinement—to jointly mitigate the coupled quantization errors. The approach provides closed-form solutions for weight calibration and scale, plus gradient-based boundary refinement, and demonstrates substantial PSNR/SSIM gains at 2-bit and 3-bit quantization, along with notable speedups and memory reductions on HPC GPUs. By systematically analyzing weight–activation coupling and proposing a principled optimization flow, HarmoQ enables robust, efficient deployment of high-quality SR models on resource-constrained devices.

Abstract

Post-training quantization offers an efficient pathway to deploy super-resolution models, yet existing methods treat weight and activation quantization independently, missing their critical interplay. Through controlled experiments on SwinIR, we uncover a striking asymmetry: weight quantization primarily degrades structural similarity, while activation quantization disproportionately affects pixel-level accuracy. This stems from their distinct roles--weights encode learned restoration priors for textures and edges, whereas activations carry input-specific intensity information. Building on this insight, we propose HarmoQ, a unified framework that harmonizes quantization across components through three synergistic steps: structural residual calibration proactively adjusts weights to compensate for activation-induced detail loss, harmonized scale optimization analytically balances quantization difficulty via closed-form solutions, and adaptive boundary refinement iteratively maintains this balance during optimization. Experiments show HarmoQ achieves substantial gains under aggressive compression, outperforming prior art by 0.46 dB on Set5 at 2-bit while delivering 3.2x speedup and 4x memory reduction on A100 GPUs. This work provides the first systematic analysis of weight-activation coupling in super-resolution quantization and establishes a principled solution for efficient high-quality image restoration.

HarmoQ: Harmonized Post-Training Quantization for High-Fidelity Image

TL;DR

HarmoQ targets efficient, high-fidelity image super-resolution under post-training quantization by uncovering a fundamental asymmetry: weight quantization mainly degrades structural similarity while activation quantization harms pixel-level accuracy. It introduces a unified three-step framework—Structural Residual Calibration, Harmonized Scale Optimization, and Adaptive Boundary Refinement—to jointly mitigate the coupled quantization errors. The approach provides closed-form solutions for weight calibration and scale, plus gradient-based boundary refinement, and demonstrates substantial PSNR/SSIM gains at 2-bit and 3-bit quantization, along with notable speedups and memory reductions on HPC GPUs. By systematically analyzing weight–activation coupling and proposing a principled optimization flow, HarmoQ enables robust, efficient deployment of high-quality SR models on resource-constrained devices.

Abstract

Post-training quantization offers an efficient pathway to deploy super-resolution models, yet existing methods treat weight and activation quantization independently, missing their critical interplay. Through controlled experiments on SwinIR, we uncover a striking asymmetry: weight quantization primarily degrades structural similarity, while activation quantization disproportionately affects pixel-level accuracy. This stems from their distinct roles--weights encode learned restoration priors for textures and edges, whereas activations carry input-specific intensity information. Building on this insight, we propose HarmoQ, a unified framework that harmonizes quantization across components through three synergistic steps: structural residual calibration proactively adjusts weights to compensate for activation-induced detail loss, harmonized scale optimization analytically balances quantization difficulty via closed-form solutions, and adaptive boundary refinement iteratively maintains this balance during optimization. Experiments show HarmoQ achieves substantial gains under aggressive compression, outperforming prior art by 0.46 dB on Set5 at 2-bit while delivering 3.2x speedup and 4x memory reduction on A100 GPUs. This work provides the first systematic analysis of weight-activation coupling in super-resolution quantization and establishes a principled solution for efficient high-quality image restoration.

Paper Structure

This paper contains 39 sections, 5 theorems, 35 equations, 5 figures, 4 tables, 1 algorithm.

Key Result

Theorem 1

For any projection $H\!\in\!\mathbb{R}^{k\times d}$ and regularization $\lambda\!>\!0$, the weight update that minimizes the structural distortion objective eq:hf_loss is δ_W^⋆ = - WE[δ_xx^⊤] H^⊤ (H E[xx^⊤] H^⊤+λ I_k)^-1 .

Figures (5)

  • Figure 1: Weight vs. activation quantization analysis. (a) Performance comparison showing that W32A4 (activation quantization) primarily degrades PSNR while W4A32 (weight quantization) shows stronger SSIM degradation, indicating complementary effects on pixel-level accuracy versus structural similarity. (b, c) Visual comparison of W32A4 and W4A32 quantization results on anime image reconstruction.
  • Figure 2: Layer-wise quantization sensitivity analysis in SwinIR. Stacked bar chart shows the proportion of sensitivity to weight versus activation quantization across four layer types. Attention and GELU layers are highly sensitive to activation quantization (92.5% and 93.6%), while Shallow layers show balanced sensitivity. Attention maps compare (a) W32A4 and (b) W4A32 configurations, illustrating differential quantization impacts in attention layer.
  • Figure 3: Structural Residual Calibration analysis in SwinIR. The top-left panel presents the optimal weight calibration $\Delta W^*$ heatmap using Laplacian filter, revealing structured channel relationships. The top-right panel shows response comparison before and after $\Delta W^*$ calibration, demonstrating structure-aware modulation. The bottom panel illustrates the efficacy across layers for different projection matrices. Laplacian filter achieves superior performance (see Table \ref{['tab:hf_projection_ablation']} for quantitative comparison), while random projection fails due to lack of structure-aware design. Cost analysis via Frobenius norm of $\Delta W$ shows consistent magnitude across projection types.
  • Figure 4: Comparison of quantization optimization strategies. (a) Existing methods (2DQuant, DOBI) independently optimize weight and activation quantization boundaries through separate calibration processes, leading to suboptimal parameter selection. (b) Our HarmoQ framework employs a unified three-step optimization: calibration for initial range estimation, harmonized scaling using optimal factor $s^*$ to balance quantization difficulty, and iterative moving bounds refinement to jointly minimize compound quantization errors. The arrows indicate the iterative optimization flow between scaling and boundary adjustment steps.
  • Figure 5: Visual comparison of different quantization methods on benchmark dataset.

Theorems & Definitions (10)

  • Theorem 1: Optimal Structural Correction
  • proof
  • Theorem 2: Harmonized Scale Optimization
  • proof
  • Theorem 3: Boundary Optimization Gradients
  • proof
  • Theorem 4: Convergence of HarmoQ Algorithm
  • proof : Proof Sketch
  • Theorem 5: Computational Complexity
  • proof