Table of Contents
Fetching ...

Semantically Structured Image Compression via Irregular Group-Based Decoupling

Ruoyu Feng, Yixin Gao, Xin Jin, Runsen Feng, Zhibo Chen

TL;DR

This work addresses inefficiencies in traditional image compression when downstream tasks require selective reconstruction by introducing irregular, semantically guided grouping. A group mask partitions images into irregular groups, and a group-independent transform, instantiated as the GI Swin-Block, enforces independence across groups during encoding and decoding. The framework generates a semantically structured bitstream that supports selective transmission and reconstruction with negligible bitrate overhead, while delivering state-of-the-art or competitive rate-distortion performance and strong performance on downstream tasks such as instance segmentation and pose estimation. The approach also enables flexible applications like semantically-aware encryption, highlighting its practical impact for human-centric and machine-driven vision pipelines.

Abstract

Image compression techniques typically focus on compressing rectangular images for human consumption, however, resulting in transmitting redundant content for downstream applications. To overcome this limitation, some previous works propose to semantically structure the bitstream, which can meet specific application requirements by selective transmission and reconstruction. Nevertheless, they divide the input image into multiple rectangular regions according to semantics and ignore avoiding information interaction among them, causing waste of bitrate and distorted reconstruction of region boundaries. In this paper, we propose to decouple an image into multiple groups with irregular shapes based on a customized group mask and compress them independently. Our group mask describes the image at a finer granularity, enabling significant bitrate saving by reducing the transmission of redundant content. Moreover, to ensure the fidelity of selective reconstruction, this paper proposes the concept of group-independent transform that maintain the independence among distinct groups. And we instantiate it by the proposed Group-Independent Swin-Block (GI Swin-Block). Experimental results demonstrate that our framework structures the bitstream with negligible cost, and exhibits superior performance on both visual quality and intelligent task supporting.

Semantically Structured Image Compression via Irregular Group-Based Decoupling

TL;DR

This work addresses inefficiencies in traditional image compression when downstream tasks require selective reconstruction by introducing irregular, semantically guided grouping. A group mask partitions images into irregular groups, and a group-independent transform, instantiated as the GI Swin-Block, enforces independence across groups during encoding and decoding. The framework generates a semantically structured bitstream that supports selective transmission and reconstruction with negligible bitrate overhead, while delivering state-of-the-art or competitive rate-distortion performance and strong performance on downstream tasks such as instance segmentation and pose estimation. The approach also enables flexible applications like semantically-aware encryption, highlighting its practical impact for human-centric and machine-driven vision pipelines.

Abstract

Image compression techniques typically focus on compressing rectangular images for human consumption, however, resulting in transmitting redundant content for downstream applications. To overcome this limitation, some previous works propose to semantically structure the bitstream, which can meet specific application requirements by selective transmission and reconstruction. Nevertheless, they divide the input image into multiple rectangular regions according to semantics and ignore avoiding information interaction among them, causing waste of bitrate and distorted reconstruction of region boundaries. In this paper, we propose to decouple an image into multiple groups with irregular shapes based on a customized group mask and compress them independently. Our group mask describes the image at a finer granularity, enabling significant bitrate saving by reducing the transmission of redundant content. Moreover, to ensure the fidelity of selective reconstruction, this paper proposes the concept of group-independent transform that maintain the independence among distinct groups. And we instantiate it by the proposed Group-Independent Swin-Block (GI Swin-Block). Experimental results demonstrate that our framework structures the bitstream with negligible cost, and exhibits superior performance on both visual quality and intelligent task supporting.
Paper Structure (19 sections, 2 equations, 14 figures)

This paper contains 19 sections, 2 equations, 14 figures.

Figures (14)

  • Figure 1: The input image is decoupled into groups according to distinct semantics. Then the semantically structured bitstream (SSB) is generated by compressing the image based on the partitioned groups. The SSB facilitates downstream applications by selective bitstream transmission and partial reconstruction, depending on the specific task requirements.
  • Figure 2: The network architecture of our proposed model with the channel-wise auto-regressive model (ChARM). ConvT denotes transposed convolution. AE and AD are respectively arithmetic encoding and arithmetic decoding. In Ours-Hyper model, we remove the ChARM component and instead output $\mu$ and $\sigma$ directly from the hyperdecoder $h_s$.
  • Figure 3: An example of group mask generation. The input image is pre-analyzed by object detection, then the group mask is generated based on the results of the pre-analysis.
  • Figure 4: Illustration of flexibility to customize the group mask: (a). Taking overlapping objects as one group with the guidance of bounding boxes. (b). Taking overlapping objects as one group with the guidance of instance masks. (c) Taking overlapping objects as distinct groups with the guidance of instance masks.
  • Figure 5: Group-independent window partition of GI Swin-Block under regular and shifted window partition. Self-attention is conducted inside each local region.
  • ...and 9 more figures