A Residual Diffusion Model for High Perceptual Quality Codec Augmentation
Noor Fathima Ghouse, Jens Petersen, Auke Wiggers, Tianlin Xu, Guillaume Sautière
TL;DR
DIRAC introduces a diffusion-based residual augmentation approach to image compression by coupling a strong base codec with a receiver-side residual diffusion model. The method models residuals conditioned on the base reconstruction, enabling a smooth, test-time traversal of the rate-distortion-perception tradeoff with efficient sampling (often 20 steps) through late-start sampling and rate-dependent thresholding. It applies to both generative compression (enhancing neural base codecs) and enhancement of standard codecs (JPEG, VTM), achieving competitive perceptual metrics (FID/256, LPIPS) while preserving PSNR, and providing practical tradeoff control. The approach demonstrates strong results on high-resolution datasets and offers a viable path for deploying perceptually rich reconstructions in real-world pipelines.
Abstract
Diffusion probabilistic models have recently achieved remarkable success in generating high quality image and video data. In this work, we build on this class of generative models and introduce a method for lossy compression of high resolution images. The resulting codec, which we call DIffuson-based Residual Augmentation Codec (DIRAC), is the first neural codec to allow smooth traversal of the rate-distortion-perception tradeoff at test time, while obtaining competitive performance with GAN-based methods in perceptual quality. Furthermore, while sampling from diffusion probabilistic models is notoriously expensive, we show that in the compression setting the number of steps can be drastically reduced.
