Table of Contents
Fetching ...

ARCHE: Autoregressive Residual Compression with Hyperprior and Excitation

Sofia Iliopoulou, Dimitris Ampeliotis, Athanassios Skodras

TL;DR

Visual comparisons confirm sharper textures and improved color fidelity, particularly at lower bit rates, demonstrating that accurate entropy modeling can be achieved through efficient convolutional designs suitable for practical deployment.

Abstract

Recent progress in learning-based image compression has demonstrated that end-to-end optimization can substantially outperform traditional codecs by jointly learning compact latent representations and probabilistic entropy models. However, many existing approaches achieve high rate-distortion efficiency at the expense of increased computational cost and limited parallelism. This paper presents ARCHE - Autoregressive Residual Compression with Hyperprior and Excitation, an end-to-end learned image compression framework that balances modeling accuracy and computational efficiency. The proposed architecture unifies hierarchical, spatial, and channel-based priors within a single probabilistic framework, capturing both global and local dependencies in the latent representation of the image, while employing adaptive feature recalibration and residual refinement to enhance latent representation quality. Without relying on recurrent or transformer-based components, ARCHE attains state-of-the-art rate-distortion efficiency: it reduces the BD-Rate by approximately 48% relative to the commonly used benchmark model of Balle et al., 30% relative to the channel-wise autoregressive model of Minnen & Singh and 5% against the VVC Intra codec on the Kodak benchmark dataset. The framework maintains computational efficiency with 95M parameters and 222ms running time per image. Visual comparisons confirm sharper textures and improved color fidelity, particularly at lower bit rates, demonstrating that accurate entropy modeling can be achieved through efficient convolutional designs suitable for practical deployment.

ARCHE: Autoregressive Residual Compression with Hyperprior and Excitation

TL;DR

Visual comparisons confirm sharper textures and improved color fidelity, particularly at lower bit rates, demonstrating that accurate entropy modeling can be achieved through efficient convolutional designs suitable for practical deployment.

Abstract

Recent progress in learning-based image compression has demonstrated that end-to-end optimization can substantially outperform traditional codecs by jointly learning compact latent representations and probabilistic entropy models. However, many existing approaches achieve high rate-distortion efficiency at the expense of increased computational cost and limited parallelism. This paper presents ARCHE - Autoregressive Residual Compression with Hyperprior and Excitation, an end-to-end learned image compression framework that balances modeling accuracy and computational efficiency. The proposed architecture unifies hierarchical, spatial, and channel-based priors within a single probabilistic framework, capturing both global and local dependencies in the latent representation of the image, while employing adaptive feature recalibration and residual refinement to enhance latent representation quality. Without relying on recurrent or transformer-based components, ARCHE attains state-of-the-art rate-distortion efficiency: it reduces the BD-Rate by approximately 48% relative to the commonly used benchmark model of Balle et al., 30% relative to the channel-wise autoregressive model of Minnen & Singh and 5% against the VVC Intra codec on the Kodak benchmark dataset. The framework maintains computational efficiency with 95M parameters and 222ms running time per image. Visual comparisons confirm sharper textures and improved color fidelity, particularly at lower bit rates, demonstrating that accurate entropy modeling can be achieved through efficient convolutional designs suitable for practical deployment.
Paper Structure (19 sections, 16 equations, 12 figures, 5 tables)

This paper contains 19 sections, 16 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 1: ARCHE architecture with hyperprior, masked context model, latent residual prediction, and two example slices showing channel conditioning with SE blocks. Q: quantization; AE/AD: arithmetic encoder/decoder; $\mu$, $\sigma$: distribution parameters.
  • Figure 2: The architecture of the masked context model. Type A masked convolution zeroes out all pixels below and to the right of the center (including the center). Type B is the same but includes the center pixel itself.
  • Figure 3: The architecture of the squeeze-and-excitation block. C denotes the number of channels and r is the reduction ratio.
  • Figure 4: Rate-distortion curves on Kodak dataset. PSNR (left) and MS-SSIM (right) versus bit rate. ARCHE achieves superior performance across most operating points, surpassing VVC intra.
  • Figure 5: Rate-distortion curves on Tecnick dataset. PSNR (left) and MS-SSIM (right) versus bit rate. ARCHE achieves superior performance across most operating points, surpassing VVC intra.
  • ...and 7 more figures