Table of Contents
Fetching ...

Idempotence and Perceptual Image Compression

Tongda Xu, Ziran Zhu, Dailan He, Yanghao Li, Lina Guo, Yuanyuan Wang, Zhe Wang, Hongwei Qin, Yan Wang, Jingjing Liu, Ya-Qin Zhang

TL;DR

This work uncovers a fundamental link between idempotence and perceptual image compression, showing that ideal conditional generative codecs are idempotent and that an unconditional generative model with an idempotence constraint is equivalent to a conditional codec. Building on this, the authors propose a practical inversion-based perceptual codec that leverages a pre-trained unconditional model and an existing MSE codec, without requiring new model training. They provide formal proofs and demonstrate empirically that their approach achieves state-of-the-art perceptual quality (lowest FID) across multiple datasets while preserving rate-distortion-perception guarantees and negotiation with the MSE baseline. While the method incurs higher test-time complexity due to gradient-based inversion, it avoids training multiple conditional models and offers a path to perception-distortion trade-offs using the same bitstream.

Abstract

Idempotence is the stability of image codec to re-compression. At the first glance, it is unrelated to perceptual image compression. However, we find that theoretically: 1) Conditional generative model-based perceptual codec satisfies idempotence; 2) Unconditional generative model with idempotence constraint is equivalent to conditional generative codec. Based on this newfound equivalence, we propose a new paradigm of perceptual image codec by inverting unconditional generative model with idempotence constraints. Our codec is theoretically equivalent to conditional generative codec, and it does not require training new models. Instead, it only requires a pre-trained mean-square-error codec and unconditional generative model. Empirically, we show that our proposed approach outperforms state-of-the-art methods such as HiFiC and ILLM, in terms of Fréchet Inception Distance (FID). The source code is provided in https://github.com/tongdaxu/Idempotence-and-Perceptual-Image-Compression.

Idempotence and Perceptual Image Compression

TL;DR

This work uncovers a fundamental link between idempotence and perceptual image compression, showing that ideal conditional generative codecs are idempotent and that an unconditional generative model with an idempotence constraint is equivalent to a conditional codec. Building on this, the authors propose a practical inversion-based perceptual codec that leverages a pre-trained unconditional model and an existing MSE codec, without requiring new model training. They provide formal proofs and demonstrate empirically that their approach achieves state-of-the-art perceptual quality (lowest FID) across multiple datasets while preserving rate-distortion-perception guarantees and negotiation with the MSE baseline. While the method incurs higher test-time complexity due to gradient-based inversion, it avoids training multiple conditional models and offers a path to perception-distortion trade-offs using the same bitstream.

Abstract

Idempotence is the stability of image codec to re-compression. At the first glance, it is unrelated to perceptual image compression. However, we find that theoretically: 1) Conditional generative model-based perceptual codec satisfies idempotence; 2) Unconditional generative model with idempotence constraint is equivalent to conditional generative codec. Based on this newfound equivalence, we propose a new paradigm of perceptual image codec by inverting unconditional generative model with idempotence constraints. Our codec is theoretically equivalent to conditional generative codec, and it does not require training new models. Instead, it only requires a pre-trained mean-square-error codec and unconditional generative model. Empirically, we show that our proposed approach outperforms state-of-the-art methods such as HiFiC and ILLM, in terms of Fréchet Inception Distance (FID). The source code is provided in https://github.com/tongdaxu/Idempotence-and-Perceptual-Image-Compression.
Paper Structure (24 sections, 5 theorems, 23 equations, 16 figures, 8 tables)

This paper contains 24 sections, 5 theorems, 23 equations, 16 figures, 8 tables.

Key Result

Theorem 1

(Perceptual quality brings idempotence) Denote $X$ as source, $f(.)$ as encoder, $Y = f(X)$ as bitstream, $g(.)$ as decoder and $\hat{X} = g(Y)$ as reconstruction. When encoder $f(.)$ is deterministic, then conditional generative model-based image codec is also idempotent, i.e.,

Figures (16)

  • Figure 1: A visual comparison of our proposed approach with state-of-the-art perceptual image codec, such as HiFiC Mentzer2020HighFidelityGI and ILLM muckley2023improving.
  • Figure 2: The relationship between idempotence and perceptual image compression.
  • Figure 3: Ablation study on unconditional generative model with FFHQ and ELIC.
  • Figure 4: A visual comparison of our proposed approach with state-of-the-art perceptual image codec, such as HiFiC Mentzer2020HighFidelityGI and ILLM muckley2023improving.
  • Figure 5: Reconstruction diversity of proposed approach.
  • ...and 11 more figures

Theorems & Definitions (9)

  • Theorem 1
  • Theorem 2
  • Corollary 1
  • Corollary 1
  • proof
  • proof
  • proof
  • Theorem 3
  • proof