ROI-based Deep Image Compression with Implicit Bit Allocation
Kai Hu, Han Wang, Renhe Liu, Zhilin Li, Shenghui Song, Yu Liu
TL;DR
The paper addresses ROI-based image compression by replacing explicit hard ROI gating with implicit bit allocation realized through a Mask-Guided Feature Enhancement (MGFE) module that combines Region-Adaptive Attention (RAA) and Frequency-Spatial Collaborative Attention (FSCA). By integrating a dual-decoder architecture for foreground and background reconstruction, the method preserves ROI fidelity while maintaining background quality, all optimized under a rate–distortion objective with region-specific losses. Empirical results on COCO2017 show substantial improvements in ROI and overall RD performance, outperforming explicit bit-allocation baselines and traditional codecs, and yielding notable gains in downstream computer-vision tasks. The work demonstrates that implicit bit allocation can better preserve salient details, improve entropy modeling, and enhance CV task accuracy, with potential practical impact for bandwidth-constrained image transmission and analysis pipelines.
Abstract
Region of Interest (ROI)-based image compression has rapidly developed due to its ability to maintain high fidelity in important regions while reducing data redundancy. However, existing compression methods primarily apply masks to suppress background information before quantization. This explicit bit allocation strategy, which uses hard gating, significantly impacts the statistical distribution of the entropy model, thereby limiting the coding performance of the compression model. In response, this work proposes an efficient ROI-based deep image compression model with implicit bit allocation. To better utilize ROI masks for implicit bit allocation, this paper proposes a novel Mask-Guided Feature Enhancement (MGFE) module, comprising a Region-Adaptive Attention (RAA) block and a Frequency-Spatial Collaborative Attention (FSCA) block. This module allows for flexible bit allocation across different regions while enhancing global and local features through frequencyspatial domain collaboration. Additionally, we use dual decoders to separately reconstruct foreground and background images, enabling the coding network to optimally balance foreground enhancement and background quality preservation in a datadriven manner. To the best of our knowledge, this is the first work to utilize implicit bit allocation for high-quality regionadaptive coding. Experiments on the COCO2017 dataset show that our implicit-based image compression method significantly outperforms explicit bit allocation approaches in rate-distortion performance, achieving optimal results while maintaining satisfactory visual quality in the reconstructed background regions.
