Table of Contents
Fetching ...

DynaQuant: Dynamic Mixed-Precision Quantization for Learned Image Compression

Youneng Bao, Yulong Cheng, Yiping Liu, Yichen Yang, Peng Qin, Mu Li, Yongsheng Liang

TL;DR

DynaQuant tackles the inefficiency of static bit-width in Learned Image Compression by introducing two intertwined dynamics: content-aware quantization and a data-driven dynamic bit-width selector. It employs per-channel learnable quantization parameters and a distance-aware gradient modulator to provide informative learning signals, while a differentiable bit-width selector assigns layer-wise bit-widths based on input statistics, jointly optimized under a rate-distortion objective. The end-to-end framework achieves RD performance close to full-precision models while delivering up to roughly $5\times$ speedups and substantially smaller model sizes, enabling practical LIC deployment on diverse hardware. This work advances LIC efficiency by bridging quantization theory with input- and layer-aware adaptations, offering a practical path toward real-time, resource-constrained image compression.

Abstract

Prevailing quantization techniques in Learned Image Compression (LIC) typically employ a static, uniform bit-width across all layers, failing to adapt to the highly diverse data distributions and sensitivity characteristics inherent in LIC models. This leads to a suboptimal trade-off between performance and efficiency. In this paper, we introduce DynaQuant, a novel framework for dynamic mixed-precision quantization that operates on two complementary levels. First, we propose content-aware quantization, where learnable scaling and offset parameters dynamically adapt to the statistical variations of latent features. This fine-grained adaptation is trained end-to-end using a novel Distance-aware Gradient Modulator (DGM), which provides a more informative learning signal than the standard Straight-Through Estimator. Second, we introduce a data-driven, dynamic bit-width selector that learns to assign an optimal bit precision to each layer, dynamically reconfiguring the network's precision profile based on the input data. Our fully dynamic approach offers substantial flexibility in balancing rate-distortion (R-D) performance and computational cost. Experiments demonstrate that DynaQuant achieves rd performance comparable to full-precision models while significantly reducing computational and storage requirements, thereby enabling the practical deployment of advanced LIC on diverse hardware platforms.

DynaQuant: Dynamic Mixed-Precision Quantization for Learned Image Compression

TL;DR

DynaQuant tackles the inefficiency of static bit-width in Learned Image Compression by introducing two intertwined dynamics: content-aware quantization and a data-driven dynamic bit-width selector. It employs per-channel learnable quantization parameters and a distance-aware gradient modulator to provide informative learning signals, while a differentiable bit-width selector assigns layer-wise bit-widths based on input statistics, jointly optimized under a rate-distortion objective. The end-to-end framework achieves RD performance close to full-precision models while delivering up to roughly speedups and substantially smaller model sizes, enabling practical LIC deployment on diverse hardware. This work advances LIC efficiency by bridging quantization theory with input- and layer-aware adaptations, offering a practical path toward real-time, resource-constrained image compression.

Abstract

Prevailing quantization techniques in Learned Image Compression (LIC) typically employ a static, uniform bit-width across all layers, failing to adapt to the highly diverse data distributions and sensitivity characteristics inherent in LIC models. This leads to a suboptimal trade-off between performance and efficiency. In this paper, we introduce DynaQuant, a novel framework for dynamic mixed-precision quantization that operates on two complementary levels. First, we propose content-aware quantization, where learnable scaling and offset parameters dynamically adapt to the statistical variations of latent features. This fine-grained adaptation is trained end-to-end using a novel Distance-aware Gradient Modulator (DGM), which provides a more informative learning signal than the standard Straight-Through Estimator. Second, we introduce a data-driven, dynamic bit-width selector that learns to assign an optimal bit precision to each layer, dynamically reconfiguring the network's precision profile based on the input data. Our fully dynamic approach offers substantial flexibility in balancing rate-distortion (R-D) performance and computational cost. Experiments demonstrate that DynaQuant achieves rd performance comparable to full-precision models while significantly reducing computational and storage requirements, thereby enabling the practical deployment of advanced LIC on diverse hardware platforms.

Paper Structure

This paper contains 27 sections, 8 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Visual and quantitative comparison of the DynaQuant method on the kodim04, using Cheng2020 cheng2020learned as the baseline. Our proposed two quantization strategies—fixed bit-width quantization (Q-Cheng) and dynamic bit-width quantization (DQ-Cheng)—achieve comparable performance to the full-precision method while delivering approximately 5 $\times$ speedup.
  • Figure 2: Gradient proxy function (top) and its derivative (bottom). The derivative exhibits periodic oscillations, reaching minima at $x$=0 and $x$=1 with peaks at $x$=0.5. All values remain strictly positive, and the amplitude is modulated by $\beta$, ensuring adaptive gradient scaling.
  • Figure 3: (a) DynaQuant Framework Overview. DQ-Block is the dynamic quantization block, and Bit-Width Selector is the bit-width selector that dynamically allocates quantization precision for each layer. (b) Bit-Width Selector and DQ-Block Structure. The bit-width selector processes input activation $A$ through adaptive pooling, MLP, and Gumbel Softmax to output bit-width selection probability distribution $p_1, p_2, \ldots, p_n$. DQ-Block quantizes the input $\{X\}$ and learnable parameters $\{W\}$ within the module according to the corresponding bit-widths based on the probability distribution, and finally generates the output through probability-weighted fusion.
  • Figure 4: R-D Performance.(a) Kodak; (b) JPEG-AI; (c) CLIC. Quantization schemes: [FP32] for original 32-bit float, [INT8] for uniform 8-bit quantization, and [INTX.YY] for our mixed-precision quantization where X.YY indicates the average bit-width (e.g., 6.81-bit for DQ-ELIC, 6.20-bit for DQ-Cheng). "Q-ELIC" refers to applying DPA to each layer of ELIC, while "DQ-ELIC" denotes the application of a DBWS to each layer of ELIC. Best viewed in color.
  • Figure 5: Qualitative comparison on a Kodak image (e.g., "kodim23") using the base model Cheng2020. (a) Full-size images (first row). (b) Zoomed-in regions highlighting texture/edge details (second row). Methods: From left to right—Ground Truth, 32-bit full-precision, Application of our Dynamic Parameter Adaptation (Q-Cheng), and application of a Dynamic Bit-Width Selector (DQ-Cheng). Metrics include bpp, PSNR, and speedup. Best viewed digitally and zoomed in.
  • ...and 1 more figures