Enhancing the Rate-Distortion-Perception Flexibility of Learned Image Codecs with Conditional Diffusion Decoders
Daniele Mari, Simone Milani
TL;DR
This paper addresses perceptual quality in learned image compression by introducing a conditional diffusion decoder that shares latent space with a standard decoder. By keeping the encoder and prior fixed and using diffusion-based decoding at the decoder, the approach yields multiple rate-distortion-perception operating points through sampling, without re-training the encoder. The method minimizes a perceptual-loss term with a diffusion objective and demonstrates tunable tradeoffs on Kodak images, offering a practical path to perceptually driven compression with adjustable computational cost. Overall, the work highlights diffusion decoders as a flexible tool to navigate the rate-distortion-perception frontier in learned image coding.
Abstract
Learned image compression codecs have recently achieved impressive compression performances surpassing the most efficient image coding architectures. However, most approaches are trained to minimize rate and distortion which often leads to unsatisfactory visual results at low bitrates since perceptual metrics are not taken into account. In this paper, we show that conditional diffusion models can lead to promising results in the generative compression task when used as a decoder, and that, given a compressed representation, they allow creating new tradeoff points between distortion and perception at the decoder side based on the sampling method.
