Image Compression for Machine and Human Vision with Spatial-Frequency Adaptation
Han Li, Shaohui Li, Shuangrui Ding, Wenrui Dai, Maida Cao, Chenglin Li, Junni Zou, Hongkai Xiong
TL;DR
The paper addresses image compression for machine and human vision (ICMH) by reducing training and storage overhead when adapting pre-trained human-vision codecs to machine-vision tasks. It introduces Adapt-ICMH, a plug-and-play framework that inserts Spatial-Frequency Modulation Adapters (SFMA) after the encoder and decoder while freezing the base codec, and optimizes with a loss $\mathcal{L} = \mathcal{R} + \lambda \cdot \mathcal{D}(\mathbf{x}, \hat{\mathbf{x}}; \mathcal{G})$ to balance bitrate and task-perceptual distortion. SFMA combines a Spatial Modulation Adapter and a Frequency Modulation Adapter to suppress non-semantic spatial information and emphasize task-relevant frequencies, enabling efficient latent adaptation with only a small fraction of trainable parameters. Experiments across multiple LIC backbones and machine vision tasks demonstrate consistent rate-accuracy gains, reduced training overhead, and compatibility with diverse architectures, with qualitative and scalable-coding benefits highlighted.
Abstract
Image compression for machine and human vision (ICMH) has gained increasing attention in recent years. Existing ICMH methods are limited by high training and storage overheads due to heavy design of task-specific networks. To address this issue, in this paper, we develop a novel lightweight adapter-based tuning framework for ICMH, named Adapt-ICMH, that better balances task performance and bitrates with reduced overheads. We propose a spatial-frequency modulation adapter (SFMA) that simultaneously eliminates non-semantic redundancy with a spatial modulation adapter, and enhances task-relevant frequency components and suppresses task-irrelevant frequency components with a frequency modulation adapter. The proposed adapter is plug-and-play and compatible with almost all existing learned image compression models without compromising the performance of pre-trained models. Experiments demonstrate that Adapt-ICMH consistently outperforms existing ICMH frameworks on various machine vision tasks with fewer fine-tuned parameters and reduced computational complexity. Code will be released at https://github.com/qingshi9974/ECCV2024-AdpatICMH .
