Table of Contents
Fetching ...

ROI-based Deep Image Compression with Implicit Bit Allocation

Kai Hu, Han Wang, Renhe Liu, Zhilin Li, Shenghui Song, Yu Liu

TL;DR

The paper addresses ROI-based image compression by replacing explicit hard ROI gating with implicit bit allocation realized through a Mask-Guided Feature Enhancement (MGFE) module that combines Region-Adaptive Attention (RAA) and Frequency-Spatial Collaborative Attention (FSCA). By integrating a dual-decoder architecture for foreground and background reconstruction, the method preserves ROI fidelity while maintaining background quality, all optimized under a rate–distortion objective with region-specific losses. Empirical results on COCO2017 show substantial improvements in ROI and overall RD performance, outperforming explicit bit-allocation baselines and traditional codecs, and yielding notable gains in downstream computer-vision tasks. The work demonstrates that implicit bit allocation can better preserve salient details, improve entropy modeling, and enhance CV task accuracy, with potential practical impact for bandwidth-constrained image transmission and analysis pipelines.

Abstract

Region of Interest (ROI)-based image compression has rapidly developed due to its ability to maintain high fidelity in important regions while reducing data redundancy. However, existing compression methods primarily apply masks to suppress background information before quantization. This explicit bit allocation strategy, which uses hard gating, significantly impacts the statistical distribution of the entropy model, thereby limiting the coding performance of the compression model. In response, this work proposes an efficient ROI-based deep image compression model with implicit bit allocation. To better utilize ROI masks for implicit bit allocation, this paper proposes a novel Mask-Guided Feature Enhancement (MGFE) module, comprising a Region-Adaptive Attention (RAA) block and a Frequency-Spatial Collaborative Attention (FSCA) block. This module allows for flexible bit allocation across different regions while enhancing global and local features through frequencyspatial domain collaboration. Additionally, we use dual decoders to separately reconstruct foreground and background images, enabling the coding network to optimally balance foreground enhancement and background quality preservation in a datadriven manner. To the best of our knowledge, this is the first work to utilize implicit bit allocation for high-quality regionadaptive coding. Experiments on the COCO2017 dataset show that our implicit-based image compression method significantly outperforms explicit bit allocation approaches in rate-distortion performance, achieving optimal results while maintaining satisfactory visual quality in the reconstructed background regions.

ROI-based Deep Image Compression with Implicit Bit Allocation

TL;DR

The paper addresses ROI-based image compression by replacing explicit hard ROI gating with implicit bit allocation realized through a Mask-Guided Feature Enhancement (MGFE) module that combines Region-Adaptive Attention (RAA) and Frequency-Spatial Collaborative Attention (FSCA). By integrating a dual-decoder architecture for foreground and background reconstruction, the method preserves ROI fidelity while maintaining background quality, all optimized under a rate–distortion objective with region-specific losses. Empirical results on COCO2017 show substantial improvements in ROI and overall RD performance, outperforming explicit bit-allocation baselines and traditional codecs, and yielding notable gains in downstream computer-vision tasks. The work demonstrates that implicit bit allocation can better preserve salient details, improve entropy modeling, and enhance CV task accuracy, with potential practical impact for bandwidth-constrained image transmission and analysis pipelines.

Abstract

Region of Interest (ROI)-based image compression has rapidly developed due to its ability to maintain high fidelity in important regions while reducing data redundancy. However, existing compression methods primarily apply masks to suppress background information before quantization. This explicit bit allocation strategy, which uses hard gating, significantly impacts the statistical distribution of the entropy model, thereby limiting the coding performance of the compression model. In response, this work proposes an efficient ROI-based deep image compression model with implicit bit allocation. To better utilize ROI masks for implicit bit allocation, this paper proposes a novel Mask-Guided Feature Enhancement (MGFE) module, comprising a Region-Adaptive Attention (RAA) block and a Frequency-Spatial Collaborative Attention (FSCA) block. This module allows for flexible bit allocation across different regions while enhancing global and local features through frequencyspatial domain collaboration. Additionally, we use dual decoders to separately reconstruct foreground and background images, enabling the coding network to optimally balance foreground enhancement and background quality preservation in a datadriven manner. To the best of our knowledge, this is the first work to utilize implicit bit allocation for high-quality regionadaptive coding. Experiments on the COCO2017 dataset show that our implicit-based image compression method significantly outperforms explicit bit allocation approaches in rate-distortion performance, achieving optimal results while maintaining satisfactory visual quality in the reconstructed background regions.

Paper Structure

This paper contains 17 sections, 9 equations, 10 figures, 1 table.

Figures (10)

  • Figure 1: An overview of different bit allocation methods. Our approach not only utilizes ROI masks to generate attention maps for adaptively enhancing or suppressing features across different regions but also employs a dual-decoder architecture to balance the reconstruction quality between the foreground (FG) and background (BG).
  • Figure 2: Visualization of latent features and bit allocation maps trained with explicit and implicit bit allocation methods. We also visualize the loss curves for RD optimization during training and the ROI-PSNR curves on the test validation set under different methods. The integrity of ROI features obtained by explicit methods is limited and the global image occupies more bits, especially ROI. In addition, it can be seen from the convergence curve that the explicit method converges too early, which can easily lead to local optima and result in suboptimal RD performance.
  • Figure 3: The overall framework of our proposed method. "MGFE" represents Mask-guided Feature Enhancement module. AE and AD are arithmetic en/de-coder, respectively. N is set to 192, and M is 320 as in zou2022devil.
  • Figure 4: The distribution histogram and normal probability plot obtained from the quantized latent features $\boldsymbol{\hat{y}}$ statistics. The experimental validation used over 3000 images from the COCO2017 dataset. The baseline model is the classic LIC model (STF) zou2022devil.
  • Figure 5: Illustration of the MGFE Module, FA and SA block in FSCA block. The MGFE Module is mainly composed of RAA block and FSCA block.
  • ...and 5 more figures