CMamba: Learned Image Compression with State Space Models

Zhuojie Wu; Heming Du; Shuyun Wang; Ming Lu; Haiyang Sun; Yandong Guo; Xin Yu

CMamba: Learned Image Compression with State Space Models

Zhuojie Wu, Heming Du, Shuyun Wang, Ming Lu, Haiyang Sun, Yandong Guo, Xin Yu

TL;DR

CMamba tackles the trade-off between rate-distortion performance and computational efficiency in Learned Image Compression by marrying CNNs with State Space Models. It introduces a Content-Adaptive SSM (CA-SSM) that dynamically fuses global content from SSMs with local details from CNNs, and a Context-Aware Entropy (CAE) module that jointly models spatial and channel dependencies to optimize entropy coding. Empirical results on Kodak, Tecnick, and CLIC show BD-Rate reductions relative to VVC and state-of-the-art LIC methods, alongside substantial reductions in parameters, FLOPs, and decoding time on Kodak. This hybrid approach demonstrates that selective scanning and autoregressive channel modeling can achieve practical, scalable compression without sacrificing quality.

Abstract

Learned Image Compression (LIC) has explored various architectures, such as Convolutional Neural Networks (CNNs) and transformers, in modeling image content distributions in order to achieve compression effectiveness. However, achieving high rate-distortion performance while maintaining low computational complexity (\ie, parameters, FLOPs, and latency) remains challenging. In this paper, we propose a hybrid Convolution and State Space Models (SSMs) based image compression framework, termed \textit{CMamba}, to achieve superior rate-distortion performance with low computational complexity. Specifically, CMamba introduces two key components: a Content-Adaptive SSM (CA-SSM) module and a Context-Aware Entropy (CAE) module. First, we observed that SSMs excel in modeling overall content but tend to lose high-frequency details. In contrast, CNNs are proficient at capturing local details. Motivated by this, we propose the CA-SSM module that can dynamically fuse global content extracted by SSM blocks and local details captured by CNN blocks in both encoding and decoding stages. As a result, important image content is well preserved during compression. Second, our proposed CAE module is designed to reduce spatial and channel redundancies in latent representations after encoding. Specifically, our CAE leverages SSMs to parameterize the spatial content in latent representations. Benefiting from SSMs, CAE significantly improves spatial compression efficiency while reducing spatial content redundancies. Moreover, along the channel dimension, CAE reduces inter-channel redundancies of latent representations via an autoregressive manner, which can fully exploit prior knowledge from previous channels without sacrificing efficiency. Experimental results demonstrate that CMamba achieves superior rate-distortion performance.

CMamba: Learned Image Compression with State Space Models

TL;DR

Abstract

CMamba: Learned Image Compression with State Space Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)