Semantically Structured Image Compression via Irregular Group-Based Decoupling
Ruoyu Feng, Yixin Gao, Xin Jin, Runsen Feng, Zhibo Chen
TL;DR
This work addresses inefficiencies in traditional image compression when downstream tasks require selective reconstruction by introducing irregular, semantically guided grouping. A group mask partitions images into irregular groups, and a group-independent transform, instantiated as the GI Swin-Block, enforces independence across groups during encoding and decoding. The framework generates a semantically structured bitstream that supports selective transmission and reconstruction with negligible bitrate overhead, while delivering state-of-the-art or competitive rate-distortion performance and strong performance on downstream tasks such as instance segmentation and pose estimation. The approach also enables flexible applications like semantically-aware encryption, highlighting its practical impact for human-centric and machine-driven vision pipelines.
Abstract
Image compression techniques typically focus on compressing rectangular images for human consumption, however, resulting in transmitting redundant content for downstream applications. To overcome this limitation, some previous works propose to semantically structure the bitstream, which can meet specific application requirements by selective transmission and reconstruction. Nevertheless, they divide the input image into multiple rectangular regions according to semantics and ignore avoiding information interaction among them, causing waste of bitrate and distorted reconstruction of region boundaries. In this paper, we propose to decouple an image into multiple groups with irregular shapes based on a customized group mask and compress them independently. Our group mask describes the image at a finer granularity, enabling significant bitrate saving by reducing the transmission of redundant content. Moreover, to ensure the fidelity of selective reconstruction, this paper proposes the concept of group-independent transform that maintain the independence among distinct groups. And we instantiate it by the proposed Group-Independent Swin-Block (GI Swin-Block). Experimental results demonstrate that our framework structures the bitstream with negligible cost, and exhibits superior performance on both visual quality and intelligent task supporting.
