Super-High-Fidelity Image Compression via Hierarchical-ROI and Adaptive Quantization
Jixiang Luo, Yan Wang, Hongwei Qin
TL;DR
This work tackles blur and deformation in learned image compression at very low bitrates by integrating Hierarchical-ROI (H-ROI) to allocate bits across multiple foreground regions and a background, with channel-wise non-linear adaptive quantization to tightly control bitrate. Built on an ELIC-based architecture, the method optimizes a rate-distortion objective while employing saliency-driven ROI masks, GAN/perceptual losses, and progressive decoding across ROI layers. Empirical results show substantial LPIPS improvements and significant bit-rate reductions relative to BPG and HiFiC, with especially pronounced gains for small faces and text, while preserving PSNR/MS-SSIM. The approach demonstrates that content-aware ROI masking and non-linear, multi-channel quantization can push LIC performance at low bitrate envelopes, offering practical gains for visual quality and potential machine-coding applications.
Abstract
Learned Image Compression (LIC) has achieved dramatic progress regarding objective and subjective metrics. MSE-based models aim to improve objective metrics while generative models are leveraged to improve visual quality measured by subjective metrics. However, they all suffer from blurring or deformation at low bit rates, especially at below $0.2bpp$. Besides, deformation on human faces and text is unacceptable for visual quality assessment, and the problem becomes more prominent on small faces and text. To solve this problem, we combine the advantage of MSE-based models and generative models by utilizing region of interest (ROI). We propose Hierarchical-ROI (H-ROI), to split images into several foreground regions and one background region to improve the reconstruction of regions containing faces, text, and complex textures. Further, we propose adaptive quantization by non-linear mapping within the channel dimension to constrain the bit rate while maintaining the visual quality. Exhaustive experiments demonstrate that our methods achieve better visual quality on small faces and text with lower bit rates, e.g., $0.7X$ bits of HiFiC and $0.5X$ bits of BPG.
