Table of Contents
Fetching ...

A Multi-Stage Optimization Framework for Deploying Learned Image Compression on FPGAs

Jiaxun Fang, Li Chen

TL;DR

The paper tackles deploying state-of-the-art LIC on FPGAs by first mitigating quantization-induced losses through Dynamic Range-Aware Quantization (DRAQ), which uses statistically calibrated activation clipping and outlier-regularization to produce high-fidelity INT8 models, including GDN-based architectures. Building on this, it adds hardware-aware optimizations: Progressive Mixed-Precision Search to assign non-uniform bit-widths per layer and GDN-Slimming to remove redundant channels, yielding substantial complexity reductions with minimal RD impact. Empirical results show BD-rate reductions to as low as ~4.93% for INT8 on a Ballé model and improved RD for GDN-based networks, plus a ~20% reduction in GFLOPs with preserved RD in the final integrated system. The integrated framework achieves state-of-the-art hardware efficiency and RD quality compared to existing FPGA LIC implementations, offering a complete co-design pathway from FP32 LIC to FPGA-ready, high-performance deployments.

Abstract

Deep learning-based image compression (LIC) has achieved state-of-the-art rate-distortion (RD) performance, yet deploying these models on resource-constrained FPGAs remains a major challenge. This work presents a complete, multi-stage optimization framework to bridge the gap between high-performance floating-point models and efficient, hardware-friendly integer-based implementations. First, we address the fundamental problem of quantization-induced performance degradation. We propose a Dynamic Range-Aware Quantization (DRAQ) method that uses statistically-calibrated activation clipping and a novel weight regularization scheme to counteract the effects of extreme data outliers and large dynamic ranges, successfully creating a high-fidelity 8-bit integer model. Second, building on this robust foundation, we introduce two hardware-aware optimization techniques tailored for FPGAs. A progressive mixed-precision search algorithm exploits FPGA flexibility to assign optimal, non-uniform bit-widths to each layer, minimizing complexity while preserving performance. Concurrently, a channel pruning method, adapted to work with the Generalized Divisive Normalization (GDN) layers common in LIC, removes model redundancy by eliminating inactive channels. Our comprehensive experiments show that the foundational DRAQ method reduces the BD-rate overhead of a GDN-based model from $30\%$ to $6.3\%$. The subsequent hardware-aware optimizations further reduce computational complexity by over $20\%$ with negligible impact on RD performance, yielding a final model that is both state-of-the-art in efficiency and superior in quality to existing FPGA-based LIC implementations.

A Multi-Stage Optimization Framework for Deploying Learned Image Compression on FPGAs

TL;DR

The paper tackles deploying state-of-the-art LIC on FPGAs by first mitigating quantization-induced losses through Dynamic Range-Aware Quantization (DRAQ), which uses statistically calibrated activation clipping and outlier-regularization to produce high-fidelity INT8 models, including GDN-based architectures. Building on this, it adds hardware-aware optimizations: Progressive Mixed-Precision Search to assign non-uniform bit-widths per layer and GDN-Slimming to remove redundant channels, yielding substantial complexity reductions with minimal RD impact. Empirical results show BD-rate reductions to as low as ~4.93% for INT8 on a Ballé model and improved RD for GDN-based networks, plus a ~20% reduction in GFLOPs with preserved RD in the final integrated system. The integrated framework achieves state-of-the-art hardware efficiency and RD quality compared to existing FPGA LIC implementations, offering a complete co-design pathway from FP32 LIC to FPGA-ready, high-performance deployments.

Abstract

Deep learning-based image compression (LIC) has achieved state-of-the-art rate-distortion (RD) performance, yet deploying these models on resource-constrained FPGAs remains a major challenge. This work presents a complete, multi-stage optimization framework to bridge the gap between high-performance floating-point models and efficient, hardware-friendly integer-based implementations. First, we address the fundamental problem of quantization-induced performance degradation. We propose a Dynamic Range-Aware Quantization (DRAQ) method that uses statistically-calibrated activation clipping and a novel weight regularization scheme to counteract the effects of extreme data outliers and large dynamic ranges, successfully creating a high-fidelity 8-bit integer model. Second, building on this robust foundation, we introduce two hardware-aware optimization techniques tailored for FPGAs. A progressive mixed-precision search algorithm exploits FPGA flexibility to assign optimal, non-uniform bit-widths to each layer, minimizing complexity while preserving performance. Concurrently, a channel pruning method, adapted to work with the Generalized Divisive Normalization (GDN) layers common in LIC, removes model redundancy by eliminating inactive channels. Our comprehensive experiments show that the foundational DRAQ method reduces the BD-rate overhead of a GDN-based model from to . The subsequent hardware-aware optimizations further reduce computational complexity by over with negligible impact on RD performance, yielding a final model that is both state-of-the-art in efficiency and superior in quality to existing FPGA-based LIC implementations.

Paper Structure

This paper contains 49 sections, 11 equations, 13 figures, 9 tables.

Figures (13)

  • Figure 1: An overview of our proposed two-stage optimization framework. Stage I, the Dynamic Range-Aware Quantization (DRAQ) framework, addresses fundamental data distribution challenges by constraining activation ranges and penalizing weight outliers to create a high-fidelity integer model. Stage II applies hardware-aware optimizations tailored for FPGAs, including Channel Pruning via GDN-Slimming and a Progressive Mixed-Precision Search, to systematically balance performance and hardware complexity.
  • Figure 2: The evolution of seminal LIC architectures. (Left) The foundational variational autoencoder framework where latents $\mathbf{y}$ are quantized and reconstructed. (Center) The addition of a hyperprior network ($h_a, h_s$) that uses side information $\mathbf{z}$ to model the latent distribution, improving compression rate. (Right) The joint architecture, which further incorporates an autoregressive context model ($C$) to create a more powerful conditional probability model for the latents, leading to superior performance.
  • Figure 3: The challenge of quantizing LIC models, illustrated by the weight distribution of a single representative convolutional layer. The boxplot for each output channel reveals a high concentration of weights near zero (indicated by the small boxes), but also a tiny fraction of extreme outliers (circles) that are orders of magnitude larger. These outliers dictate an inefficiently wide quantization range, degrading precision for the majority of weights and motivating our proposed outlier-aware framework.
  • Figure 4: The challenge of activation quantization, illustrated on a single representative layer. (a) The dynamic range of activations varies dramatically across different channels, with a few channels having ranges far exceeding the majority. (b) Histograms reveal that channels also possess distinct distributions. This high channel-wise heterogeneity makes a single, shared per-layer quantization scale highly suboptimal, leading to significant information loss for most channels.
  • Figure 5: The workflow of our proposed Dynamic Range-Aware Quantization (DRAQ) framework. The process involves pre-training a baseline model, using a calibration set to determine optimal parameters for clipping and regularization, and then fine-tuning the model with these constraints to achieve a high-performance quantized model.
  • ...and 8 more figures