CMC-Bench: Towards a New Paradigm of Visual Signal Compression
Chunyi Li, Xiele Wu, Haoning Wu, Donghui Feng, Zicheng Zhang, Guo Lu, Xiongkuo Min, Xiaohong Liu, Guangtao Zhai, Weisi Lin
TL;DR
This work introduces CMC-Bench, a benchmark for evaluating cross-modality image compression that couples I2T and T2I models to achieve ultra-low bitrate performance. It provides a large-scale dataset (58,000 images) and 160,000 expert subjective scores across four compression modes (Text, Pixel, Image, Full) to jointly assess consistency and perception. The study demonstrates that certain I2T+T2I combinations can outperform traditional codecs at very low bitrates, while outlining limitations and directions for improving model design and robustness across content types. By releasing ground-truth data, evaluation metrics, and baselines, CMC-Bench aims to accelerate the development of semantic-level visual codecs and invites broad participation from LMM developers.
Abstract
Ultra-low bitrate image compression is a challenging and demanding topic. With the development of Large Multimodal Models (LMMs), a Cross Modality Compression (CMC) paradigm of Image-Text-Image has emerged. Compared with traditional codecs, this semantic-level compression can reduce image data size to 0.1\% or even lower, which has strong potential applications. However, CMC has certain defects in consistency with the original image and perceptual quality. To address this problem, we introduce CMC-Bench, a benchmark of the cooperative performance of Image-to-Text (I2T) and Text-to-Image (T2I) models for image compression. This benchmark covers 18,000 and 40,000 images respectively to verify 6 mainstream I2T and 12 T2I models, including 160,000 subjective preference scores annotated by human experts. At ultra-low bitrates, this paper proves that the combination of some I2T and T2I models has surpassed the most advanced visual signal codecs; meanwhile, it highlights where LMMs can be further optimized toward the compression task. We encourage LMM developers to participate in this test to promote the evolution of visual signal codec protocols.
