Table of Contents
Fetching ...

Enhancing the Rate-Distortion-Perception Flexibility of Learned Image Codecs with Conditional Diffusion Decoders

Daniele Mari, Simone Milani

TL;DR

This paper addresses perceptual quality in learned image compression by introducing a conditional diffusion decoder that shares latent space with a standard decoder. By keeping the encoder and prior fixed and using diffusion-based decoding at the decoder, the approach yields multiple rate-distortion-perception operating points through sampling, without re-training the encoder. The method minimizes a perceptual-loss term with a diffusion objective and demonstrates tunable tradeoffs on Kodak images, offering a practical path to perceptually driven compression with adjustable computational cost. Overall, the work highlights diffusion decoders as a flexible tool to navigate the rate-distortion-perception frontier in learned image coding.

Abstract

Learned image compression codecs have recently achieved impressive compression performances surpassing the most efficient image coding architectures. However, most approaches are trained to minimize rate and distortion which often leads to unsatisfactory visual results at low bitrates since perceptual metrics are not taken into account. In this paper, we show that conditional diffusion models can lead to promising results in the generative compression task when used as a decoder, and that, given a compressed representation, they allow creating new tradeoff points between distortion and perception at the decoder side based on the sampling method.

Enhancing the Rate-Distortion-Perception Flexibility of Learned Image Codecs with Conditional Diffusion Decoders

TL;DR

This paper addresses perceptual quality in learned image compression by introducing a conditional diffusion decoder that shares latent space with a standard decoder. By keeping the encoder and prior fixed and using diffusion-based decoding at the decoder, the approach yields multiple rate-distortion-perception operating points through sampling, without re-training the encoder. The method minimizes a perceptual-loss term with a diffusion objective and demonstrates tunable tradeoffs on Kodak images, offering a practical path to perceptually driven compression with adjustable computational cost. Overall, the work highlights diffusion decoders as a flexible tool to navigate the rate-distortion-perception frontier in learned image coding.

Abstract

Learned image compression codecs have recently achieved impressive compression performances surpassing the most efficient image coding architectures. However, most approaches are trained to minimize rate and distortion which often leads to unsatisfactory visual results at low bitrates since perceptual metrics are not taken into account. In this paper, we show that conditional diffusion models can lead to promising results in the generative compression task when used as a decoder, and that, given a compressed representation, they allow creating new tradeoff points between distortion and perception at the decoder side based on the sampling method.
Paper Structure (5 sections, 5 equations, 3 figures)

This paper contains 5 sections, 5 equations, 3 figures.

Figures (3)

  • Figure 1: Scheme of the proposed network
  • Figure 2: Comparison between an image reconstructed with MSH fig \ref{['fig:sfig1']} against the same image reconstructed with our method fig \ref{['fig:sfig2']} at the exact same bitrate
  • Figure 3: Comparison between results of various approaches on the Kodak dataset in terms of unreferenced and referenced metrics.