Table of Contents
Fetching ...

High Frequency Matters: Uncertainty Guided Image Compression with Wavelet Diffusion

Juan Song, Jiaxiang He, Lijie Yang, Mingtao Feng, Keyan Wang

TL;DR

UGDiff tackles the high-frequency fidelity challenge in diffusion-based image compression by decoupling low and high frequencies with a wavelet transform, using a synthetic high-frequency Generator as a conditioning signal for a wavelet-domain diffusion model to predict high-frequency components, and transmitting the residuals through an uncertainty-guided RD loss. An aleatoric uncertainty map, estimated via Last-Layer Laplace Approximation, drives adaptive bit allocation to uncertain regions, improving rate-distortion-perception trade-offs. Empirical results on Kodak and CLIC2020 show state-of-the-art RD performance, stronger perceptual quality, and faster decoding due to sparse high-frequency diffusion and efficient conditioning. The approach also demonstrates substantial BD-rate savings over both traditional codecs and prior diffusion-based methods, with code made available by the authors.

Abstract

Diffusion probabilistic models have recently achieved remarkable success in generating high-quality images. However, balancing high perceptual quality and low distortion remains challenging in application of diffusion models in image compression. To address this issue, we propose a novel Uncertainty-Guided image compression approach with wavelet Diffusion (UGDiff). Our approach focuses on high frequency compression via the wavelet transform, since high frequency components are crucial for reconstructing image details. We introduce a wavelet conditional diffusion model for high frequency prediction, followed by a residual codec that compresses and transmits prediction residuals to the decoder. This diffusion prediction-then-residual compression paradigm effectively addresses the low fidelity issue common in direct reconstructions by existing diffusion models. Considering the uncertainty from the random sampling of the diffusion model, we further design an uncertainty-weighted rate-distortion (R-D) loss tailored for residual compression, providing a more rational trade-off between rate and distortion. Comprehensive experiments on two benchmark datasets validate the effectiveness of UGDiff, surpassing state-of-the-art image compression methods in R-D performance, perceptual quality, subjective quality, and inference time. Our code is available at: https://github.com/hejiaxiang1/Wavelet-Diffusion/tree/main.

High Frequency Matters: Uncertainty Guided Image Compression with Wavelet Diffusion

TL;DR

UGDiff tackles the high-frequency fidelity challenge in diffusion-based image compression by decoupling low and high frequencies with a wavelet transform, using a synthetic high-frequency Generator as a conditioning signal for a wavelet-domain diffusion model to predict high-frequency components, and transmitting the residuals through an uncertainty-guided RD loss. An aleatoric uncertainty map, estimated via Last-Layer Laplace Approximation, drives adaptive bit allocation to uncertain regions, improving rate-distortion-perception trade-offs. Empirical results on Kodak and CLIC2020 show state-of-the-art RD performance, stronger perceptual quality, and faster decoding due to sparse high-frequency diffusion and efficient conditioning. The approach also demonstrates substantial BD-rate savings over both traditional codecs and prior diffusion-based methods, with code made available by the authors.

Abstract

Diffusion probabilistic models have recently achieved remarkable success in generating high-quality images. However, balancing high perceptual quality and low distortion remains challenging in application of diffusion models in image compression. To address this issue, we propose a novel Uncertainty-Guided image compression approach with wavelet Diffusion (UGDiff). Our approach focuses on high frequency compression via the wavelet transform, since high frequency components are crucial for reconstructing image details. We introduce a wavelet conditional diffusion model for high frequency prediction, followed by a residual codec that compresses and transmits prediction residuals to the decoder. This diffusion prediction-then-residual compression paradigm effectively addresses the low fidelity issue common in direct reconstructions by existing diffusion models. Considering the uncertainty from the random sampling of the diffusion model, we further design an uncertainty-weighted rate-distortion (R-D) loss tailored for residual compression, providing a more rational trade-off between rate and distortion. Comprehensive experiments on two benchmark datasets validate the effectiveness of UGDiff, surpassing state-of-the-art image compression methods in R-D performance, perceptual quality, subjective quality, and inference time. Our code is available at: https://github.com/hejiaxiang1/Wavelet-Diffusion/tree/main.
Paper Structure (12 sections, 19 equations, 9 figures, 4 tables, 1 algorithm)

This paper contains 12 sections, 19 equations, 9 figures, 4 tables, 1 algorithm.

Figures (9)

  • Figure 1: Illustration of image details (b) and image reconstructed by an end-to-end learned image compression network(cheng2020 cheng2020learned) (c).
  • Figure 2: Overview of the UGDiff. UGDiff adopts a wavelet diffusion predictive coding pipeline. High frequency is predicted by the conditional diffusion, which is conditioned by synthetic high frequency produced by the low-to-high frequency translator. Simultaneously, the uncertainty map of predicted high frequency is estimated along the reverse diffusion sampling process. The residual between predicted and ground-truth high frequency is then compressed with an uncertainty-weighted residual codec. The reconstructed low- and high-frequency components are finally inversely transformed by 2D-IDWT to reconstruct the image.
  • Figure 3: Wavelet decomposition. (a) Source Image, (b) Wavelet Sub-bands, (c) Tree structure diagram of the wavelet decomposition. There exhibit strong inter-band correlations within the same region (indicated by the red box) sharing similar structure information between low frequency and high frequency components.
  • Figure 4: The forward and reverse process of our conditional diffusion model.
  • Figure 5: Overview of the low-to-high frequency translator.
  • ...and 4 more figures