Rate-Distortion-Cognition Controllable Versatile Neural Image Compression
Jinming Liu, Ruoyu Feng, Yunpeng Qi, Qiuyu Chen, Zhibo Chen, Wenjun Zeng, Xin Jin
TL;DR
The paper tackles the problem of efficient image compression for downstream machine vision tasks by introducing a single, controllable codec that spans Rate, Distortion, and Cognition. It proposes a two-branch architecture: a cognition-oriented primary branch with a channel-controllable latent gain unit for variable bitrate, and a distortion-oriented auxiliary branch that transmits a scalable residual bitstream, combined via interpolation $ar{x} = \\hat{x}_1 + (1-\\beta)\\boldsymbol{r}$. A MoCo-based cognition-oriented loss and a local MSE penalty are used to train the primary branch, while residual information is conveyed through a scalable bitstream to improve reconstruction quality, enabling a controllable trade-off between task performance and visual fidelity. Experiments on ImageNet classification, COCO detection, and Cityscapes segmentation demonstrate superior cognition performance at lower bitrates and competitive reconstruction quality, indicating practical benefits for deployment without training multiple codecs.
Abstract
Recently, the field of Image Coding for Machines (ICM) has garnered heightened interest and significant advances thanks to the rapid progress of learning-based techniques for image compression and analysis. Previous studies often require training separate codecs to support various bitrate levels, machine tasks, and networks, thus lacking both flexibility and practicality. To address these challenges, we propose a rate-distortion-cognition controllable versatile image compression, which method allows the users to adjust the bitrate (i.e., Rate), image reconstruction quality (i.e., Distortion), and machine task accuracy (i.e., Cognition) with a single neural model, achieving ultra-controllability. Specifically, we first introduce a cognition-oriented loss in the primary compression branch to train a codec for diverse machine tasks. This branch attains variable bitrate by regulating quantization degree through the latent code channels. To further enhance the quality of the reconstructed images, we employ an auxiliary branch to supplement residual information with a scalable bitstream. Ultimately, two branches use a `$βx + (1 - β) y$' interpolation strategy to achieve a balanced cognition-distortion trade-off. Extensive experiments demonstrate that our method yields satisfactory ICM performance and flexible Rate-Distortion-Cognition controlling.
