Table of Contents
Fetching ...

Controlling Rate, Distortion, and Realism: Towards a Single Comprehensive Neural Image Compression Model

Shoma Iwai, Tomo Miyazaki, Shinichiro Omachi

TL;DR

The paper tackles the impracticality of maintaining multiple models for different bit rates in neural image compression by introducing a variable-rate GAN-based NIC that jointly controls bitrate, distortion, and realism through two inputs: quality level $q$ and realism weight $\beta$. A novel HRRGAN loss and carefully designed discriminators enable a single model to match or exceed state-of-the-art single-rate generative NIC performance across a wide rate range, with performance validated on CLIC2020 and Kodak datasets. The approach includes a two-stage training regime, beta-conditioning for realism, and ICA-based rate control to realize 17 rate points, demonstrating strong rate-distortion-realism trade-offs. This work has practical impact by reducing model maintenance costs while delivering high perceptual quality across varied compression settings.

Abstract

In recent years, neural network-driven image compression (NIC) has gained significant attention. Some works adopt deep generative models such as GANs and diffusion models to enhance perceptual quality (realism). A critical obstacle of these generative NIC methods is that each model is optimized for a single bit rate. Consequently, multiple models are required to compress images to different bit rates, which is impractical for real-world applications. To tackle this issue, we propose a variable-rate generative NIC model. Specifically, we explore several discriminator designs tailored for the variable-rate approach and introduce a novel adversarial loss. Moreover, by incorporating the newly proposed multi-realism technique, our method allows the users to adjust the bit rate, distortion, and realism with a single model, achieving ultra-controllability. Unlike existing variable-rate generative NIC models, our method matches or surpasses the performance of state-of-the-art single-rate generative NIC models while covering a wide range of bit rates using just one model. Code will be available at https://github.com/iwa-shi/CRDR

Controlling Rate, Distortion, and Realism: Towards a Single Comprehensive Neural Image Compression Model

TL;DR

The paper tackles the impracticality of maintaining multiple models for different bit rates in neural image compression by introducing a variable-rate GAN-based NIC that jointly controls bitrate, distortion, and realism through two inputs: quality level and realism weight . A novel HRRGAN loss and carefully designed discriminators enable a single model to match or exceed state-of-the-art single-rate generative NIC performance across a wide rate range, with performance validated on CLIC2020 and Kodak datasets. The approach includes a two-stage training regime, beta-conditioning for realism, and ICA-based rate control to realize 17 rate points, demonstrating strong rate-distortion-realism trade-offs. This work has practical impact by reducing model maintenance costs while delivering high perceptual quality across varied compression settings.

Abstract

In recent years, neural network-driven image compression (NIC) has gained significant attention. Some works adopt deep generative models such as GANs and diffusion models to enhance perceptual quality (realism). A critical obstacle of these generative NIC methods is that each model is optimized for a single bit rate. Consequently, multiple models are required to compress images to different bit rates, which is impractical for real-world applications. To tackle this issue, we propose a variable-rate generative NIC model. Specifically, we explore several discriminator designs tailored for the variable-rate approach and introduce a novel adversarial loss. Moreover, by incorporating the newly proposed multi-realism technique, our method allows the users to adjust the bit rate, distortion, and realism with a single model, achieving ultra-controllability. Unlike existing variable-rate generative NIC models, our method matches or surpasses the performance of state-of-the-art single-rate generative NIC models while covering a wide range of bit rates using just one model. Code will be available at https://github.com/iwa-shi/CRDR
Paper Structure (24 sections, 5 equations, 16 figures, 1 table, 1 algorithm)

This paper contains 24 sections, 5 equations, 16 figures, 1 table, 1 algorithm.

Figures (16)

  • Figure 1: Top left: rate-distortion (measured by PSNR) and top right: rate-realism (measured by FID) performance using a single model. "bpp" stands for bits-per-pixel. While state-of-the-art GAN-based NIC methods, Multi-RealismAgustsson2023 (capable of adjusting distortion-realism trade-off) and HiFiCMentzer2020 are optimized to a single bit rate, our method can control the balance between rate, distortion, and realism, covering the green area with just one model. This is achieved by adjusting two input parameters, $q$ and $\beta$, which control the rate and the distortion-realism trade-offBlau2018, respectively. Bottom: the original image and compression results of our method. It illustrates that our method can handle different compression settings like A low-rate and low-distortion mode and D high-rate and high-realism mode.
  • Figure 2: The overview of our NIC model. RBs and Attn in the encoder and generator stands for residual blocks and attention module used in ELICHe_2022_CVPR. ICA is an interpolation channel attention layerSun_2021_ACMMM (see right side for detail). AE and AD represent an arithmetic encoder, and arithmetic decoder, respectively. For the generator, we use beta-conditioningAgustsson2023 to control the realism of the reconstruction.
  • Figure 3: The discriminator designs that we consider. Discriminators take an original image or its reconstruction as input and estimate the reality of the input. They consist of two kind of convolution layers: (1) layers applied for all quality levels and (2) layers applied for a specific quality level. For layer (1), we introduce a quality condition (right).
  • Figure 4: How to calculate the relative "reality score" on (a) RaGAN in unconditional GAN, (b) RGAN in variable-rate image compression, and (c) HRRGAN in variable-rate image compression.
  • Figure 5: Quantitative results on CLIC2020 test (top) and Kodak dataset (bottom). We use PSNR to evaluate the rate-distortion performance and FID and LPIPS for the rate-realism performance. Solid lines represent variable-rate methods, while dashed lines denote single-rate methods. As for the markers, circles ($\bullet$ and $\circ$) represent variable-realism NIC, triangles ($\blacktriangle$) indicate generative NIC, and lines without markers indicate non-generative methods. We report LPIPS on CLIC2020 dataset in the supplementary material.
  • ...and 11 more figures