Table of Contents
Fetching ...

Rate-Distortion-Cognition Controllable Versatile Neural Image Compression

Jinming Liu, Ruoyu Feng, Yunpeng Qi, Qiuyu Chen, Zhibo Chen, Wenjun Zeng, Xin Jin

TL;DR

The paper tackles the problem of efficient image compression for downstream machine vision tasks by introducing a single, controllable codec that spans Rate, Distortion, and Cognition. It proposes a two-branch architecture: a cognition-oriented primary branch with a channel-controllable latent gain unit for variable bitrate, and a distortion-oriented auxiliary branch that transmits a scalable residual bitstream, combined via interpolation $ar{x} = \\hat{x}_1 + (1-\\beta)\\boldsymbol{r}$. A MoCo-based cognition-oriented loss and a local MSE penalty are used to train the primary branch, while residual information is conveyed through a scalable bitstream to improve reconstruction quality, enabling a controllable trade-off between task performance and visual fidelity. Experiments on ImageNet classification, COCO detection, and Cityscapes segmentation demonstrate superior cognition performance at lower bitrates and competitive reconstruction quality, indicating practical benefits for deployment without training multiple codecs.

Abstract

Recently, the field of Image Coding for Machines (ICM) has garnered heightened interest and significant advances thanks to the rapid progress of learning-based techniques for image compression and analysis. Previous studies often require training separate codecs to support various bitrate levels, machine tasks, and networks, thus lacking both flexibility and practicality. To address these challenges, we propose a rate-distortion-cognition controllable versatile image compression, which method allows the users to adjust the bitrate (i.e., Rate), image reconstruction quality (i.e., Distortion), and machine task accuracy (i.e., Cognition) with a single neural model, achieving ultra-controllability. Specifically, we first introduce a cognition-oriented loss in the primary compression branch to train a codec for diverse machine tasks. This branch attains variable bitrate by regulating quantization degree through the latent code channels. To further enhance the quality of the reconstructed images, we employ an auxiliary branch to supplement residual information with a scalable bitstream. Ultimately, two branches use a `$βx + (1 - β) y$' interpolation strategy to achieve a balanced cognition-distortion trade-off. Extensive experiments demonstrate that our method yields satisfactory ICM performance and flexible Rate-Distortion-Cognition controlling.

Rate-Distortion-Cognition Controllable Versatile Neural Image Compression

TL;DR

The paper tackles the problem of efficient image compression for downstream machine vision tasks by introducing a single, controllable codec that spans Rate, Distortion, and Cognition. It proposes a two-branch architecture: a cognition-oriented primary branch with a channel-controllable latent gain unit for variable bitrate, and a distortion-oriented auxiliary branch that transmits a scalable residual bitstream, combined via interpolation . A MoCo-based cognition-oriented loss and a local MSE penalty are used to train the primary branch, while residual information is conveyed through a scalable bitstream to improve reconstruction quality, enabling a controllable trade-off between task performance and visual fidelity. Experiments on ImageNet classification, COCO detection, and Cityscapes segmentation demonstrate superior cognition performance at lower bitrates and competitive reconstruction quality, indicating practical benefits for deployment without training multiple codecs.

Abstract

Recently, the field of Image Coding for Machines (ICM) has garnered heightened interest and significant advances thanks to the rapid progress of learning-based techniques for image compression and analysis. Previous studies often require training separate codecs to support various bitrate levels, machine tasks, and networks, thus lacking both flexibility and practicality. To address these challenges, we propose a rate-distortion-cognition controllable versatile image compression, which method allows the users to adjust the bitrate (i.e., Rate), image reconstruction quality (i.e., Distortion), and machine task accuracy (i.e., Cognition) with a single neural model, achieving ultra-controllability. Specifically, we first introduce a cognition-oriented loss in the primary compression branch to train a codec for diverse machine tasks. This branch attains variable bitrate by regulating quantization degree through the latent code channels. To further enhance the quality of the reconstructed images, we employ an auxiliary branch to supplement residual information with a scalable bitstream. Ultimately, two branches use a `' interpolation strategy to achieve a balanced cognition-distortion trade-off. Extensive experiments demonstrate that our method yields satisfactory ICM performance and flexible Rate-Distortion-Cognition controlling.
Paper Structure (21 sections, 9 equations, 10 figures, 4 tables)

This paper contains 21 sections, 9 equations, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Motivation: (a) Traditional image compression tasks require training $K \times M \times T$ codecs for $K$ bitrate points, $M$ machine vision tasks, and $T$ cognition-distortion trade-offs. (b) Our method enables controllable rate-distortion-cognition with a single codec.
  • Figure 2: The visualization and spectrum of (a) original image, (b) distortion-oriented compressed image $\boldsymbol{\hat{x}}_2$, (c) cognition-oriented compressed image $\boldsymbol{\hat{x}}_1$ and, (d) the residual $|\boldsymbol{\hat{x}}_1 - \boldsymbol{\hat{x}}_2|$ of (b) and (c).
  • Figure 3: The framework of our method. The framework of our method. Our pipeline is divided into a primary branch and an auxiliary branch. The primary branch generates the cognition-oriented image $\hat{x}_1$, while the auxiliary branch generates the distortion-oriented residual $r$. We then use a $\hat{x}_1 + (1 - \beta)r$ interpolation strategy to achieve a balanced cognition-distortion trade-off. These two branches are trained separately in two stages: the primary branch uses cognition-oriented loss, and the auxiliary branch uses distortion-oriented loss.
  • Figure 4: The training stage I. The parameters of the pretrained backbone remain frozen, with only the parameters of our codec being updated.
  • Figure 5: The histograms for pixel values of: (a) the original image. (b) distortion-oriented compressed image. (c) cognition-oriented compressed image without using local mse loss. (d) cognition-oriented compressed images processed with clipping to the range (0, 1) during training. (e) cognition-oriented compressed images with using local mse loss.
  • ...and 5 more figures