Table of Contents
Fetching ...

Training-Free Rate-Distortion-Perception Traversal With Diffusion

Yuhan Wang, Suzhi Bi, Ying-Jun Angela Zhang

TL;DR

A training-free framework that leverages pre-trained diffusion models to traverse the entire RDP surface and theoretically proves that the proposed diffusion decoder is optimal for the distortion-perception tradeoff under AWGN observations and that the overall framework achieves the optimal RDP function in the Gaussian case.

Abstract

The rate-distortion-perception (RDP) tradeoff characterizes the fundamental limits of lossy compression by jointly considering bitrate, reconstruction fidelity, and perceptual quality. While recent neural compression methods have improved perceptual performance, they typically operate at fixed points on the RDP surface, requiring retraining to target different tradeoffs. In this work, we propose a training-free framework that leverages pre-trained diffusion models to traverse the entire RDP surface. Our approach integrates a reverse channel coding (RCC) module with a novel score-scaled probability flow ODE decoder. We theoretically prove that the proposed diffusion decoder is optimal for the distortion-perception tradeoff under AWGN observations and that the overall framework with the RCC module achieves the optimal RDP function in the Gaussian case. Empirical results across multiple datasets demonstrate the framework's flexibility and effectiveness in navigating the ternary RDP tradeoff using pre-trained diffusion models. Our results establish a practical and theoretically grounded approach to adaptive, perception-aware compression.

Training-Free Rate-Distortion-Perception Traversal With Diffusion

TL;DR

A training-free framework that leverages pre-trained diffusion models to traverse the entire RDP surface and theoretically proves that the proposed diffusion decoder is optimal for the distortion-perception tradeoff under AWGN observations and that the overall framework achieves the optimal RDP function in the Gaussian case.

Abstract

The rate-distortion-perception (RDP) tradeoff characterizes the fundamental limits of lossy compression by jointly considering bitrate, reconstruction fidelity, and perceptual quality. While recent neural compression methods have improved perceptual performance, they typically operate at fixed points on the RDP surface, requiring retraining to target different tradeoffs. In this work, we propose a training-free framework that leverages pre-trained diffusion models to traverse the entire RDP surface. Our approach integrates a reverse channel coding (RCC) module with a novel score-scaled probability flow ODE decoder. We theoretically prove that the proposed diffusion decoder is optimal for the distortion-perception tradeoff under AWGN observations and that the overall framework with the RCC module achieves the optimal RDP function in the Gaussian case. Empirical results across multiple datasets demonstrate the framework's flexibility and effectiveness in navigating the ternary RDP tradeoff using pre-trained diffusion models. Our results establish a practical and theoretically grounded approach to adaptive, perception-aware compression.
Paper Structure (33 sections, 7 theorems, 99 equations, 17 figures, 3 tables, 2 algorithms)

This paper contains 33 sections, 7 theorems, 99 equations, 17 figures, 3 tables, 2 algorithms.

Key Result

Lemma 1

Consider the multivariate Gaussian source $X\sim \mathcal{N}(\boldsymbol\mu_0,\boldsymbol\Sigma_0)$. Let $\boldsymbol{\mu}_k = \sqrt{\bar{\alpha}_k}\boldsymbol{\mu}_0$ and $\boldsymbol{\Sigma}_k=\bar{\alpha}_k\boldsymbol\Sigma_0+(1-\bar{\alpha}_k)\mathbf I~$ for $k\in\{1, \dots, t\}$. Starting from Meanwhile, when $\rho=0$, the variance is $\bar{\alpha}_t\boldsymbol{\Sigma}_0^2\boldsymbol{\Sigma}

Figures (17)

  • Figure 1: The proposed framework to traverse the RDP function using pre-trained diffusion models.
  • Figure 2: Information-theoretical RDP function for scalar Gaussian source (dashed line) and achieved rate, MSE, and W2 distance levels by our scheme (solid dots). (a) The RDP surface. (b) $R(D,P)$ function along DP planes. Different colors represent different rates.
  • Figure 3: Effect of controlling $t$ and $\rho$ on different metrics for the CIFAR-10 dataset. Distortion is quantified by MSE, and perception is measured by LPIPS and FID.
  • Figure 4: Rate-distortion-perception curves on the CIFAR-10 dataset. Distortion levels are quantified by MSE and perception levels are measured by LPIPS.
  • Figure 5: RDP tradeoff traversed by our proposed scheme on the Kodak and DIV2K datasets. We show the results obtained with Stable Diffusion (SD) 2.1 and the Flux model, respectively. More tradeoffs measured in different metrics (e.g., PSNR and FID) can be found in Appendix \ref{['App-subsec-KD-more-results']}.
  • ...and 12 more figures

Theorems & Definitions (16)

  • Remark 1
  • Lemma 1
  • proof
  • Proposition 2: DP-tradeoff_Wasserstein_Freirich2021
  • proof
  • Theorem 3
  • proof
  • Remark 2
  • Theorem 4
  • proof
  • ...and 6 more