Table of Contents
Fetching ...

Turbo-DDCM: Fast and Flexible Zero-Shot Diffusion-Based Image Compression

Amit Vaisman, Guy Ohayon, Hila Manor, Michael Elad, Tomer Michaeli

TL;DR

Turbo-DDCM tackles the slow inference of zero-shot diffusion-based image compression by introducing a fast, scalable approach that builds on DDCM. It replaces per-step random noise with a closed-form, sparse, multi-atom selection that combines many codebook vectors, dramatically reducing the number of denoising steps and enabling a compact bitstream via a new encoding protocol. The method yields round-trip times around 1.5 seconds per image on standard GPUs while maintaining competitive rate-distortion-perception performance, and it also offers a priority-aware ROI variant and a distortion-target variant for PSNR control. These contributions deliver a practical, flexible zero-shot compression framework with strong potential for real-world deployment and further theoretical development.

Abstract

While zero-shot diffusion-based compression methods have seen significant progress in recent years, they remain notoriously slow and computationally demanding. This paper presents an efficient zero-shot diffusion-based compression method that runs substantially faster than existing methods, while maintaining performance that is on par with the state-of-the-art techniques. Our method builds upon the recently proposed Denoising Diffusion Codebook Models (DDCMs) compression scheme. Specifically, DDCM compresses an image by sequentially choosing the diffusion noise vectors from reproducible random codebooks, guiding the denoiser's output to reconstruct the target image. We modify this framework with Turbo-DDCM, which efficiently combines a large number of noise vectors at each denoising step, thereby significantly reducing the number of required denoising operations. This modification is also coupled with an improved encoding protocol. Furthermore, we introduce two flexible variants of Turbo-DDCM, a priority-aware variant that prioritizes user-specified regions and a distortion-controlled variant that compresses an image based on a target PSNR rather than a target BPP. Comprehensive experiments position Turbo-DDCM as a compelling, practical, and flexible image compression scheme.

Turbo-DDCM: Fast and Flexible Zero-Shot Diffusion-Based Image Compression

TL;DR

Turbo-DDCM tackles the slow inference of zero-shot diffusion-based image compression by introducing a fast, scalable approach that builds on DDCM. It replaces per-step random noise with a closed-form, sparse, multi-atom selection that combines many codebook vectors, dramatically reducing the number of denoising steps and enabling a compact bitstream via a new encoding protocol. The method yields round-trip times around 1.5 seconds per image on standard GPUs while maintaining competitive rate-distortion-perception performance, and it also offers a priority-aware ROI variant and a distortion-target variant for PSNR control. These contributions deliver a practical, flexible zero-shot compression framework with strong potential for real-world deployment and further theoretical development.

Abstract

While zero-shot diffusion-based compression methods have seen significant progress in recent years, they remain notoriously slow and computationally demanding. This paper presents an efficient zero-shot diffusion-based compression method that runs substantially faster than existing methods, while maintaining performance that is on par with the state-of-the-art techniques. Our method builds upon the recently proposed Denoising Diffusion Codebook Models (DDCMs) compression scheme. Specifically, DDCM compresses an image by sequentially choosing the diffusion noise vectors from reproducible random codebooks, guiding the denoiser's output to reconstruct the target image. We modify this framework with Turbo-DDCM, which efficiently combines a large number of noise vectors at each denoising step, thereby significantly reducing the number of required denoising operations. This modification is also coupled with an improved encoding protocol. Furthermore, we introduce two flexible variants of Turbo-DDCM, a priority-aware variant that prioritizes user-specified regions and a distortion-controlled variant that compresses an image based on a target PSNR rather than a target BPP. Comprehensive experiments position Turbo-DDCM as a compelling, practical, and flexible image compression scheme.

Paper Structure

This paper contains 52 sections, 44 equations, 24 figures.

Figures (24)

  • Figure 1: Turbo-DDCM: Our method provides reconstructions with equal or better fidelity compared to previous methods, while being much faster. At the same BPP and runtime, the priority-aware variant (bottom-right) better serves key regions of choice.
  • Figure 2: Turbo-DDCM overview: Building on DDCM, we replace its random noise sampling with an effective and efficient closed-form selection rule that can quickly combine an arbitrary number of noise vectors, enabling significantly fewer diffusion steps. The selected indices are encoded using our new bit transmission protocol, which achieves substantially higher encoding efficiency than DDCM's protocol. The decoder reconstructs the image by running the generative diffusion process while re-selecting the codebook noise vectors that correspond to the decoded indices. This results in a zero-shot compression method that is both highly efficient and competitive in performance.
  • Figure 3: Qualitative results: The presented images are taken from the Kodak24 ($512 \times 512$) dataset. Our method produces highly realistic reconstructions while achieving a speedup ranging from $3\times$ to an order of magnitude compared to previous approaches, depending on the bitrate.
  • Figure 4: Quantitative evaluation: Comparison with zero-shot (top two rows) and other (bottom row) methods, reporting distortion (PSNR, LPIPS), perceptual quality (FID), and runtime (round-trip compression–decompression in seconds). PSC's runtime is omitted due to its extreme complexity (> 300 s/image). Turbo-DDCM achieves superior or competitive results against all zero-shot methods while being substantially faster. Compared to other methods, Turbo-DDCM provides the best perceptual quality. Note that all zero-shot methods and PerCo (SD) operate in the latent space of SD 2.1, whose encoder–decoder imposes a distortion bound; we report this bound by just passing the images through this encoder–decoder.
  • Figure 5: Qualitative results of the priority-aware (PA) variant: Regular methods fail to reconstruct key regions, whereas our PA variant reconstructs them faithfully according to the prioritization mask. In the second row, the first two lines of the sign that are highly prioritized, are fully reconstructed, while the third line, medium prioritized, is only partially reconstructed. These results are better viewed when zoomed in.
  • ...and 19 more figures