Controlling Rate, Distortion, and Realism: Towards a Single Comprehensive Neural Image Compression Model
Shoma Iwai, Tomo Miyazaki, Shinichiro Omachi
TL;DR
The paper tackles the impracticality of maintaining multiple models for different bit rates in neural image compression by introducing a variable-rate GAN-based NIC that jointly controls bitrate, distortion, and realism through two inputs: quality level $q$ and realism weight $\beta$. A novel HRRGAN loss and carefully designed discriminators enable a single model to match or exceed state-of-the-art single-rate generative NIC performance across a wide rate range, with performance validated on CLIC2020 and Kodak datasets. The approach includes a two-stage training regime, beta-conditioning for realism, and ICA-based rate control to realize 17 rate points, demonstrating strong rate-distortion-realism trade-offs. This work has practical impact by reducing model maintenance costs while delivering high perceptual quality across varied compression settings.
Abstract
In recent years, neural network-driven image compression (NIC) has gained significant attention. Some works adopt deep generative models such as GANs and diffusion models to enhance perceptual quality (realism). A critical obstacle of these generative NIC methods is that each model is optimized for a single bit rate. Consequently, multiple models are required to compress images to different bit rates, which is impractical for real-world applications. To tackle this issue, we propose a variable-rate generative NIC model. Specifically, we explore several discriminator designs tailored for the variable-rate approach and introduce a novel adversarial loss. Moreover, by incorporating the newly proposed multi-realism technique, our method allows the users to adjust the bit rate, distortion, and realism with a single model, achieving ultra-controllability. Unlike existing variable-rate generative NIC models, our method matches or surpasses the performance of state-of-the-art single-rate generative NIC models while covering a wide range of bit rates using just one model. Code will be available at https://github.com/iwa-shi/CRDR
