A Multi-Stage Optimization Framework for Deploying Learned Image Compression on FPGAs

Jiaxun Fang; Li Chen

A Multi-Stage Optimization Framework for Deploying Learned Image Compression on FPGAs

Jiaxun Fang, Li Chen

TL;DR

The paper tackles deploying state-of-the-art LIC on FPGAs by first mitigating quantization-induced losses through Dynamic Range-Aware Quantization (DRAQ), which uses statistically calibrated activation clipping and outlier-regularization to produce high-fidelity INT8 models, including GDN-based architectures. Building on this, it adds hardware-aware optimizations: Progressive Mixed-Precision Search to assign non-uniform bit-widths per layer and GDN-Slimming to remove redundant channels, yielding substantial complexity reductions with minimal RD impact. Empirical results show BD-rate reductions to as low as ~4.93% for INT8 on a Ballé model and improved RD for GDN-based networks, plus a ~20% reduction in GFLOPs with preserved RD in the final integrated system. The integrated framework achieves state-of-the-art hardware efficiency and RD quality compared to existing FPGA LIC implementations, offering a complete co-design pathway from FP32 LIC to FPGA-ready, high-performance deployments.

Abstract

Deep learning-based image compression (LIC) has achieved state-of-the-art rate-distortion (RD) performance, yet deploying these models on resource-constrained FPGAs remains a major challenge. This work presents a complete, multi-stage optimization framework to bridge the gap between high-performance floating-point models and efficient, hardware-friendly integer-based implementations. First, we address the fundamental problem of quantization-induced performance degradation. We propose a Dynamic Range-Aware Quantization (DRAQ) method that uses statistically-calibrated activation clipping and a novel weight regularization scheme to counteract the effects of extreme data outliers and large dynamic ranges, successfully creating a high-fidelity 8-bit integer model. Second, building on this robust foundation, we introduce two hardware-aware optimization techniques tailored for FPGAs. A progressive mixed-precision search algorithm exploits FPGA flexibility to assign optimal, non-uniform bit-widths to each layer, minimizing complexity while preserving performance. Concurrently, a channel pruning method, adapted to work with the Generalized Divisive Normalization (GDN) layers common in LIC, removes model redundancy by eliminating inactive channels. Our comprehensive experiments show that the foundational DRAQ method reduces the BD-rate overhead of a GDN-based model from $30\%$ to $6.3\%$. The subsequent hardware-aware optimizations further reduce computational complexity by over $20\%$ with negligible impact on RD performance, yielding a final model that is both state-of-the-art in efficiency and superior in quality to existing FPGA-based LIC implementations.

A Multi-Stage Optimization Framework for Deploying Learned Image Compression on FPGAs

TL;DR

Abstract

A Multi-Stage Optimization Framework for Deploying Learned Image Compression on FPGAs

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (13)