Table of Contents
Fetching ...

Content-Aware Mamba for Learned Image Compression

Yunuo Chen, Zezheng Lyu, Bing He, Hongwei Hu, Qi Wang, Yuan Tian, Li Song, Wenjun Zhang, Guo Lu

TL;DR

The Content-Aware Mamba-based LIC model (CMIC) achieves state-of-the-art rate-distortion performance, surpassing VTM-21.0 by 15.91%, 21.34%, and 17.58% in BD-rate on the Kodak, Tecnick, and CLIC datasets.

Abstract

Recent learned image compression (LIC) leverages Mamba-style state-space models (SSMs) for global receptive fields with linear complexity. However, the standard Mamba adopts content-agnostic, predefined raster (or multi-directional) scans under strict causality. This rigidity hinders its ability to effectively eliminate redundancy between tokens that are content-correlated but spatially distant. We introduce Content-Aware Mamba (CAM), an SSM that dynamically adapts its processing to the image content. Specifically, CAM overcomes prior limitations with two novel mechanisms. First, it replaces the rigid scan with a content-adaptive token permutation strategy to prioritize interactions between content-similar tokens regardless of their location. Second, it overcomes the sequential dependency by injecting sample-specific global priors into the state-space model, which effectively mitigates the strict causality without multi-directional scans. These innovations enable CAM to better capture global redundancy while preserving computational efficiency. Our Content-Aware Mamba-based LIC model (CMIC) achieves state-of-the-art rate-distortion performance, surpassing VTM-21.0 by 15.91%, 21.34%, and 17.58% in BD-rate on the Kodak, Tecnick, and CLIC datasets, respectively. Code will be released at https://github.com/UnoC-727/CMIC.

Content-Aware Mamba for Learned Image Compression

TL;DR

The Content-Aware Mamba-based LIC model (CMIC) achieves state-of-the-art rate-distortion performance, surpassing VTM-21.0 by 15.91%, 21.34%, and 17.58% in BD-rate on the Kodak, Tecnick, and CLIC datasets.

Abstract

Recent learned image compression (LIC) leverages Mamba-style state-space models (SSMs) for global receptive fields with linear complexity. However, the standard Mamba adopts content-agnostic, predefined raster (or multi-directional) scans under strict causality. This rigidity hinders its ability to effectively eliminate redundancy between tokens that are content-correlated but spatially distant. We introduce Content-Aware Mamba (CAM), an SSM that dynamically adapts its processing to the image content. Specifically, CAM overcomes prior limitations with two novel mechanisms. First, it replaces the rigid scan with a content-adaptive token permutation strategy to prioritize interactions between content-similar tokens regardless of their location. Second, it overcomes the sequential dependency by injecting sample-specific global priors into the state-space model, which effectively mitigates the strict causality without multi-directional scans. These innovations enable CAM to better capture global redundancy while preserving computational efficiency. Our Content-Aware Mamba-based LIC model (CMIC) achieves state-of-the-art rate-distortion performance, surpassing VTM-21.0 by 15.91%, 21.34%, and 17.58% in BD-rate on the Kodak, Tecnick, and CLIC datasets, respectively. Code will be released at https://github.com/UnoC-727/CMIC.

Paper Structure

This paper contains 46 sections, 10 equations, 22 figures, 10 tables, 1 algorithm.

Figures (22)

  • Figure 1: (a) Illustration of standard 2D Selective Scan. (b) Illustration of our content-adaptive scan: content-correlated tokens are scanned consecutively to better eliminate redundancy. (c) Rate savings relative to VTM-21.0 on the Tecnick dataset. CMIC significantly outperforms two SOTA Mamba-based LIC models: MambaVC qin2024mambavc and MambaIC zeng2025mambaic.
  • Figure 2: Overview of the Proposed Method. (a) The CMIC framework. Feature dimensions are set as {$C_1, C_2, C_3, C_4$}, and the six non‑linear transform stages have depths {$L_1, L_2, L_3, L_3, L_2, L_1$}. Panel (b) shows the Content-Aware Mamba block, detailing the Content-Aware SSM architecture. Numbers 1-9 represent the indices of tokens, while letters a-d denote distinct cluster categories.
  • Figure 3: Our Entropy Model
  • Figure 4: RD Curves on the Tecnick dataset.
  • Figure 5: RD Curves on the CLIC dataset.
  • ...and 17 more figures