Table of Contents
Fetching ...

Controllable Distortion-Perception Tradeoff Through Latent Diffusion for Neural Image Compression

Chuqin Zhou, Guo Lu, Jiangchuan Li, Xiangyu Chen, Zhengxue Cheng, Li Song, Wenjun Zhang

TL;DR

This work addresses the challenge of simultaneously optimizing pixel-level fidelity and perceptual realism in neural image compression. It introduces a decoder-side adaptive latent diffusion module that transforms decoded latents via a controllable diffusion process, enabling flexible distortion-perception trade-offs without retraining the base codec. An auxiliary encoder guides perceptual optimization during training, and inference relies on a DDIM-like latent sampling with a tau-controlled fusion between original and transformed latents. Experiments show substantial gains in perceptual metrics (e.g., LPIPS-BDRate) at fixed bitrates and across multiple codecs, while preserving rate-distortion performance, making the approach practical for deploying pretrained codecs with adjustable visual quality. The method offers a broadly compatible, plug-and-play path to balance realism and fidelity in real-world neural image compression pipelines.

Abstract

Neural image compression often faces a challenging trade-off among rate, distortion and perception. While most existing methods typically focus on either achieving high pixel-level fidelity or optimizing for perceptual metrics, we propose a novel approach that simultaneously addresses both aspects for a fixed neural image codec. Specifically, we introduce a plug-and-play module at the decoder side that leverages a latent diffusion process to transform the decoded features, enhancing either low distortion or high perceptual quality without altering the original image compression codec. Our approach facilitates fusion of original and transformed features without additional training, enabling users to flexibly adjust the balance between distortion and perception during inference. Extensive experimental results demonstrate that our method significantly enhances the pretrained codecs with a wide, adjustable distortion-perception range while maintaining their original compression capabilities. For instance, we can achieve more than 150% improvement in LPIPS-BDRate without sacrificing more than 1 dB in PSNR.

Controllable Distortion-Perception Tradeoff Through Latent Diffusion for Neural Image Compression

TL;DR

This work addresses the challenge of simultaneously optimizing pixel-level fidelity and perceptual realism in neural image compression. It introduces a decoder-side adaptive latent diffusion module that transforms decoded latents via a controllable diffusion process, enabling flexible distortion-perception trade-offs without retraining the base codec. An auxiliary encoder guides perceptual optimization during training, and inference relies on a DDIM-like latent sampling with a tau-controlled fusion between original and transformed latents. Experiments show substantial gains in perceptual metrics (e.g., LPIPS-BDRate) at fixed bitrates and across multiple codecs, while preserving rate-distortion performance, making the approach practical for deploying pretrained codecs with adjustable visual quality. The method offers a broadly compatible, plug-and-play path to balance realism and fidelity in real-world neural image compression pipelines.

Abstract

Neural image compression often faces a challenging trade-off among rate, distortion and perception. While most existing methods typically focus on either achieving high pixel-level fidelity or optimizing for perceptual metrics, we propose a novel approach that simultaneously addresses both aspects for a fixed neural image codec. Specifically, we introduce a plug-and-play module at the decoder side that leverages a latent diffusion process to transform the decoded features, enhancing either low distortion or high perceptual quality without altering the original image compression codec. Our approach facilitates fusion of original and transformed features without additional training, enabling users to flexibly adjust the balance between distortion and perception during inference. Extensive experimental results demonstrate that our method significantly enhances the pretrained codecs with a wide, adjustable distortion-perception range while maintaining their original compression capabilities. For instance, we can achieve more than 150% improvement in LPIPS-BDRate without sacrificing more than 1 dB in PSNR.

Paper Structure

This paper contains 36 sections, 8 equations, 23 figures, 3 tables.

Figures (23)

  • Figure 1: Overview of our proposed method. $\mathcal{D}$ represents a plug-and-play adaptive latent fusion module at decoder side for a base neural codec. We can achieve different distortion (PSNR) and perception (LPIPS) trade-offs, controlled by $\tau$. For simplicity, quantization and entropy coding are omitted.
  • Figure 1: Trade-offs between bitrate and different metrics for various models tested on Kodak dataset. Arrows in the plot titles indicate whether high($\uparrow$) or low($\downarrow$) values indices a better score.
  • Figure 2: Illustration of the proposed method. For simplicity, we assume the base NIC is distortion-oriented. (a) represents the inference stage of our proposed pipeline. (b) and (c) represent the training procedures. We first train an auxiliary encoder $g'_a$ for the fixed base neural codec. Then, we train a plug-and-play adaptive latent fusion module to transform the original latent representations into features optimized for perceptual quality.
  • Figure 2: Distortion (PSNR) vs. perception(LPIPS) on Kodak for different rate-distortion-perception tradeoffs.
  • Figure 3: Overview of the latent diffusion process. For simplicity, we omit quantization and entropy coding modules. $\tau$ controls the diffusion process to achieve different tradeoffs.
  • ...and 18 more figures