S2CFormer: Revisiting the RD-Latency Trade-off in Transformer-based Learned Image Compression

Yunuo Chen; Qian Li; Bing He; Donghui Feng; Ronghua Wu; Qi Wang; Li Song; Guo Lu; Wenjun Zhang

S2CFormer: Revisiting the RD-Latency Trade-off in Transformer-based Learned Image Compression

Yunuo Chen, Qian Li, Bing He, Donghui Feng, Ronghua Wu, Qi Wang, Li Song, Guo Lu, Wenjun Zhang

TL;DR

The paper tackles the RD-latency trade-off in transformer-based learned image compression by shifting focus from complex spatial interactions to efficient channel aggregation. It introduces the S2CFormer paradigm, combining simplified spatial paths (Separable Conv or window Attention) with FFN-based channel aggregation, and demonstrates that channel aggregation is the primary driver of RD performance. Through S2C-Identity, S2C-Conv, and S2C-Attention variants, it achieves state-of-the-art RD with significantly faster decoding, and the S2C-Hybrid variant further optimizes the performance–latency trade-off by stage-wise combining different instantiations. The results establish new benchmarks on Kodak, Tecnick, and CLIC datasets and highlight the potential of advanced FFN structures for LIC, offering a practical path toward highly efficient, high-performance LIC systems.

Abstract

Transformer-based Learned Image Compression (LIC) suffers from a suboptimal trade-off between decoding latency and rate-distortion (R-D) performance. Moreover, the critical role of the FeedForward Network (FFN)-based channel aggregation module has been largely overlooked. Our research reveals that efficient channel aggregation-rather than complex and time-consuming spatial operations-is the key to achieving competitive LIC models. Based on this insight, we initiate the ``S2CFormer'' paradigm, a general architecture that simplifies spatial operations and enhances channel operations to overcome the previous trade-off. We present two instances of the S2CFormer: S2C-Conv, and S2C-Attention. Both models demonstrate state-of-the-art (SOTA) R-D performance and significantly faster decoding speed. Furthermore, we introduce S2C-Hybrid, an enhanced variant that maximizes the strengths of different S2CFormer instances to achieve a better performance-latency trade-off. This model outperforms all the existing methods on the Kodak, Tecnick, and CLIC Professional Validation datasets, setting a new benchmark for efficient and high-performance LIC. The code is at \href{https://github.com/YunuoChen/S2CFormer}{https://github.com/YunuoChen/S2CFormer}.

S2CFormer: Revisiting the RD-Latency Trade-off in Transformer-based Learned Image Compression

TL;DR

Abstract

S2CFormer: Revisiting the RD-Latency Trade-off in Transformer-based Learned Image Compression

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (15)