CoD: A Diffusion Foundation Model for Image Compression

Zhaoyang Jia; Zihan Zheng; Naifu Xue; Jiahao Li; Bin Li; Zongyu Guo; Xiaoyi Zhang; Houqiang Li; Yan Lu

CoD: A Diffusion Foundation Model for Image Compression

Zhaoyang Jia, Zihan Zheng, Naifu Xue, Jiahao Li, Bin Li, Zongyu Guo, Xiaoyi Zhang, Houqiang Li, Yan Lu

TL;DR

CoD introduces a compression-oriented diffusion foundation model trained from scratch to optimize both encoding and diffusion-based reconstruction, serving as a reusable backbone for downstream diffusion codecs. It demonstrates superior ultra-low-bitrate performance, notably outperforming text-conditioned diffusion backbones and approaching VTM-level PSNR in pixel-space when paired with DiffC, while maintaining low training cost on open image datasets. The work provides insights into scaling behavior, pixel-space versus latent-space diffusion, and zero-shot distortion-perception control, and it shows practical impact by enabling high-quality compression across standard benchmarks. Overall, CoD lays a foundation for diffusion-based compression research and practical, reproducible exploration on accessible data and hardware.

Abstract

Existing diffusion codecs typically build on text-to-image diffusion foundation models like Stable Diffusion. However, text conditioning is suboptimal from a compression perspective, hindering the potential of downstream diffusion codecs, particularly at ultra-low bitrates. To address it, we introduce \textbf{CoD}, the first \textbf{Co}mpression-oriented \textbf{D}iffusion foundation model, trained from scratch to enable end-to-end optimization of both compression and generation. CoD is not a fixed codec but a general foundation model designed for various diffusion-based codecs. It offers several advantages: \textbf{High compression efficiency}, replacing Stable Diffusion with CoD in downstream codecs like DiffC achieves SOTA results, especially at ultra-low bitrates (e.g., 0.0039 bpp); \textbf{Low-cost and reproducible training}, 300$\times$ faster training than Stable Diffusion ($\sim$ 20 vs. $\sim$ 6,250 A100 GPU days) on entirely open image-only datasets; \textbf{Providing new insights}, e.g., We find pixel-space diffusion can achieve VTM-level PSNR with high perceptual quality and can outperform GAN-based codecs using fewer parameters. We hope CoD lays the foundation for future diffusion codec research. Codes will be released.

CoD: A Diffusion Foundation Model for Image Compression

TL;DR

Abstract

CoD: A Diffusion Foundation Model for Image Compression

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (18)