Table of Contents
Fetching ...

Lossy Compression with Pretrained Diffusion Models

Jeremy Vonderfecht, Feng Liu

TL;DR

The paper addresses the challenge of using pretrained diffusion models for practical lossy image compression by delivering the first complete DiffC implementation on Stable Diffusion variants and Flux-dev. It introduces a fast CUDA-based reverse-channel coding workflow, a greedy timestep scheduling strategy, and Flux adaptation to enable zero-shot compression without extra training, achieving encoding/decoding in under 10 seconds. Empirical results show competitive rate–distortion and perceptual quality at ultra-low bitrates, with fidelity bounded by the latent diffusion model's VAE and varying across models (Flux achieving the highest PSNR bound on Kodak). The work demonstrates the practical potential of diffusion-based compression for real-world use, while highlighting speed and model-scale as key avenues for further impact.

Abstract

We apply the DiffC algorithm (Theis et al. 2022) to Stable Diffusion 1.5, 2.1, XL, and Flux-dev, and demonstrate that these pretrained models are remarkably capable lossy image compressors. A principled algorithm for lossy compression using pretrained diffusion models has been understood since at least Ho et al. 2020, but challenges in reverse-channel coding have prevented such algorithms from ever being fully implemented. We introduce simple workarounds that lead to the first complete implementation of DiffC, which is capable of compressing and decompressing images using Stable Diffusion in under 10 seconds. Despite requiring no additional training, our method is competitive with other state-of-the-art generative compression methods at low ultra-low bitrates.

Lossy Compression with Pretrained Diffusion Models

TL;DR

The paper addresses the challenge of using pretrained diffusion models for practical lossy image compression by delivering the first complete DiffC implementation on Stable Diffusion variants and Flux-dev. It introduces a fast CUDA-based reverse-channel coding workflow, a greedy timestep scheduling strategy, and Flux adaptation to enable zero-shot compression without extra training, achieving encoding/decoding in under 10 seconds. Empirical results show competitive rate–distortion and perceptual quality at ultra-low bitrates, with fidelity bounded by the latent diffusion model's VAE and varying across models (Flux achieving the highest PSNR bound on Kodak). The work demonstrates the practical potential of diffusion-based compression for real-world use, while highlighting speed and model-scale as key avenues for further impact.

Abstract

We apply the DiffC algorithm (Theis et al. 2022) to Stable Diffusion 1.5, 2.1, XL, and Flux-dev, and demonstrate that these pretrained models are remarkably capable lossy image compressors. A principled algorithm for lossy compression using pretrained diffusion models has been understood since at least Ho et al. 2020, but challenges in reverse-channel coding have prevented such algorithms from ever being fully implemented. We introduce simple workarounds that lead to the first complete implementation of DiffC, which is capable of compressing and decompressing images using Stable Diffusion in under 10 seconds. Despite requiring no additional training, our method is competitive with other state-of-the-art generative compression methods at low ultra-low bitrates.
Paper Structure (26 sections, 8 equations, 8 figures, 2 tables, 4 algorithms)

This paper contains 26 sections, 8 equations, 8 figures, 2 tables, 4 algorithms.

Figures (8)

  • Figure 1: Kodak images compressed using our method on Stable Diffusion 1.5, Text-Sketch-PICS lei2023text+, VTM vtm, and PerCo careil2024towards. Text+Sketch/VTM/PerCo images are taken from careil2024towards
  • Figure 2: Rate-distortion curves for generative compression methods across three sets of images. "Hypothetical" refers hypothetically-optimal RD curves assuming ideal reverse-channel coding. Best viewed zoomed in.
  • Figure 3: Visual comparison of generative compression methods.
  • Figure 4: RD curves on Kodak dataset vs. number of RCC steps. Legend shows encoding time per image in seconds.
  • Figure 5: R-D curves for Stable Diffusion 1.5 on the Kodak dataset with various ways to determine the $D_{KL}$ per step.
  • ...and 3 more figures