Table of Contents
Fetching ...

Interleaved Block-based Learned Image Compression with Feature Enhancement and Quantization Error Compensation

Shiqi Jiang, Hui Yuan, Shuai Li, Raouf Hamzaoui, Xu Wang, Junyan Huo

TL;DR

This paper tackles the inefficiencies of learned image compression (LIC) by mitigating quantization-induced information loss and shrinking latent representations. It introduces four plug-and-play modules—Feature Extraction Module (FExM), Feature Refinement Module (FRM), Feature Enhancement Module (FEnM), and Quantization Error Compensation Module (QECM)—into a Tiny-LIC backbone to improve rate-distortion performance, leveraging pixel shuffling, 3D concatenated residuals, Fourier-series quantization compensation, and a FE loss. The approach achieves notable RD gains on Kodak and CLIC datasets, outperforming traditional standards like H.266/VVC and several LIC baselines in PSNR and MS-SSIM metrics, while maintaining reasonable complexity. Overall, the work offers a flexible, modular framework that enhances LIC via targeted feature manipulation and quantization error handling, with potential extensions to lighter architectures and other 3D tasks.

Abstract

In recent years, learned image compression (LIC) methods have achieved significant performance improvements. However, obtaining a more compact latent representation and reducing the impact of quantization errors remain key challenges in the field of LIC. To address these challenges, we propose a feature extraction module, a feature refinement module, and a feature enhancement module. Our feature extraction module shuffles the pixels in the image, splits the resulting image into sub-images, and extracts coarse features from the sub-images. Our feature refinement module stacks the coarse features and uses an attention refinement block composed of concatenated three-dimensional convolution residual blocks to learn more compact latent features by exploiting correlations across channels, within sub-images (intra-sub-image correlations), and across sub-images (inter-sub-image correlations). Our feature enhancement module reduces information loss in the decoded features following quantization. We also propose a quantization error compensation module that mitigates the quantization mismatch between training and testing. Our four modules can be readily integrated into state-of-the-art LIC methods. Experiments show that combining our modules with Tiny-LIC outperforms existing LIC methods and image compression standards in terms of peak signal-to-noise ratio (PSNR) and multi-scale structural similarity (MS-SSIM) on the Kodak dataset and the CLIC dataset.

Interleaved Block-based Learned Image Compression with Feature Enhancement and Quantization Error Compensation

TL;DR

This paper tackles the inefficiencies of learned image compression (LIC) by mitigating quantization-induced information loss and shrinking latent representations. It introduces four plug-and-play modules—Feature Extraction Module (FExM), Feature Refinement Module (FRM), Feature Enhancement Module (FEnM), and Quantization Error Compensation Module (QECM)—into a Tiny-LIC backbone to improve rate-distortion performance, leveraging pixel shuffling, 3D concatenated residuals, Fourier-series quantization compensation, and a FE loss. The approach achieves notable RD gains on Kodak and CLIC datasets, outperforming traditional standards like H.266/VVC and several LIC baselines in PSNR and MS-SSIM metrics, while maintaining reasonable complexity. Overall, the work offers a flexible, modular framework that enhances LIC via targeted feature manipulation and quantization error handling, with potential extensions to lighter architectures and other 3D tasks.

Abstract

In recent years, learned image compression (LIC) methods have achieved significant performance improvements. However, obtaining a more compact latent representation and reducing the impact of quantization errors remain key challenges in the field of LIC. To address these challenges, we propose a feature extraction module, a feature refinement module, and a feature enhancement module. Our feature extraction module shuffles the pixels in the image, splits the resulting image into sub-images, and extracts coarse features from the sub-images. Our feature refinement module stacks the coarse features and uses an attention refinement block composed of concatenated three-dimensional convolution residual blocks to learn more compact latent features by exploiting correlations across channels, within sub-images (intra-sub-image correlations), and across sub-images (inter-sub-image correlations). Our feature enhancement module reduces information loss in the decoded features following quantization. We also propose a quantization error compensation module that mitigates the quantization mismatch between training and testing. Our four modules can be readily integrated into state-of-the-art LIC methods. Experiments show that combining our modules with Tiny-LIC outperforms existing LIC methods and image compression standards in terms of peak signal-to-noise ratio (PSNR) and multi-scale structural similarity (MS-SSIM) on the Kodak dataset and the CLIC dataset.

Paper Structure

This paper contains 20 sections, 14 equations, 13 figures, 5 tables.

Figures (13)

  • Figure 1: Overview of the proposed method. FExM denotes the feature extraction module. FRM denotes the feature refinement module. FEnM denotes the feature enhancement module. The codec adopts a joint autoregressive and hierarchical priors model, which consists of an analysis/synthesis transform, a hyper analysis/synthesis transform, a context model, an entropy parameters model, quantization and entropy coding. $\textbf{U}\mid\textbf{Q}$ denotes the quantization process, where $\textbf{U}$ represents adding uniform noise during training and $\textbf{Q}$ represents rounding quantization during testing. $\textbf{AE}$ and $\textbf{AD}$ denote arithmetic encoding and arithmetic decoding, respectively.
  • Figure 2: Framework. The codec uses a main and a hyper encoder-decoder architecture based on VAE. FExM applies image split and feature extraction. FRM refines the stacked features with ARB, which uses an attention block (AB) to refine the features in the channel, spatial, temporal and feature dimensions. FEnM uses two dense blocks (DB) to enhance the distorted features. In the codec, each ICRSA consists of a conv layer, a concatenated residual module (CRM), and an RNAB. RNAB lu2022high aggregates neighborhood information based on attention. $d_i(i=1,...,6)$ is the number of RNABs used at the $i$-th stage. $\mathrm{St}$, $\mathrm{Sr}$ represent stacking and re-arranging. $k5s2$ denotes a kernel with a size of $5 \times 5$ and a stride of 2.
  • Figure 3: Split module and reconstruction module. Solid circles represent the original pixels, while solid rectangles represent the reconstructed pixels.
  • Figure 4: Attention Refinement Block. The 3D Res-b structure (as shown in Figure \ref{['fig:3D Res']}(b)) is used as an example.
  • Figure 5: 3D Res blocks. (a) 3D Res-a: Basic 3D Residual block (RB), (b) 3D Res-b: Two-level concatenated 3D residual block,(c) 3D Res-c: Three-level 3D residual block
  • ...and 8 more figures