Multi-Scale Invertible Neural Network for Wide-Range Variable-Rate Learned Image Compression
Hanyue Tu, Siqi Wu, Li Li, Wengang Zhou, Houqiang Li
TL;DR
This paper tackles the limitations of autoencoder-based learned image compression by introducing a lightweight multi-scale invertible neural network that bijectively maps images to latent representations, enabling true information preservation during quantization. It combines a four-level invertible transform with a multi-scale spatial-channel context model and extended gain units to support wide-range, variable-rate compression from a single model. Experimental results show state-of-the-art performance across a broad bitrate range, outperforming VVC in many regimes and remaining competitive with multi-model learned approaches, while achieving superior fidelity under repeated re-encodings. The approach offers practical benefits in terms of model size, training efficiency, and robustness, making invertible transforms a compelling alternative for high-bitrate image compression.
Abstract
Autoencoder-based structures have dominated recent learned image compression methods. However, the inherent information loss associated with autoencoders limits their rate-distortion performance at high bit rates and restricts their flexibility of rate adaptation. In this paper, we present a variable-rate image compression model based on invertible transform to overcome these limitations. Specifically, we design a lightweight multi-scale invertible neural network, which bijectively maps the input image into multi-scale latent representations. To improve the compression efficiency, a multi-scale spatial-channel context model with extended gain units is devised to estimate the entropy of the latent representation from high to low levels. Experimental results demonstrate that the proposed method achieves state-of-the-art performance compared to existing variable-rate methods, and remains competitive with recent multi-model approaches. Notably, our method is the first learned image compression solution that outperforms VVC across a very wide range of bit rates using a single model, especially at high bit rates. The source code is available at https://github.com/hytu99/MSINN-VRLIC.
