Table of Contents
Fetching ...

Synonymous Variational Inference for Perceptual Image Compression

Zijian Liang, Kai Niu, Changshuo Wang, Jin Xu, Ping Zhang

TL;DR

Perceptual image compression is analyzed from a semantic-information perspective using Synonymous Variational Inference (SVI). The method introduces Synonymous Image Compression (SIC), a codec that encodes only the synonymous latent representation and samples detailed content to generate multiple perceptually similar images. The authors prove that the optimization direction corresponds to a synonymous rate-distortion-perception tradeoff and implement a progressive SIC codec that achieves competitive RD-P and perceptual performance against established PIC methods. This work provides a unified theoretical framework for semantic information in image coding and points to future enhancements using adversarial losses.

Abstract

Recent contributions of semantic information theory reveal the set-element relationship between semantic and syntactic information, represented as synonymous relationships. In this paper, we propose a synonymous variational inference (SVI) method based on this synonymity viewpoint to re-analyze the perceptual image compression problem. It takes perceptual similarity as a typical synonymous criterion to build an ideal synonymous set (Synset), and approximate the posterior of its latent synonymous representation with a parametric density by minimizing a partial semantic KL divergence. This analysis theoretically proves that the optimization direction of perception image compression follows a triple tradeoff that can cover the existing rate-distortion-perception schemes. Additionally, we introduce synonymous image compression (SIC), a new image compression scheme that corresponds to the analytical process of SVI, and implement a progressive SIC codec to fully leverage the model's capabilities. Experimental results demonstrate comparable rate-distortion-perception performance using a single progressive SIC codec, thus verifying the effectiveness of our proposed analysis method.

Synonymous Variational Inference for Perceptual Image Compression

TL;DR

Perceptual image compression is analyzed from a semantic-information perspective using Synonymous Variational Inference (SVI). The method introduces Synonymous Image Compression (SIC), a codec that encodes only the synonymous latent representation and samples detailed content to generate multiple perceptually similar images. The authors prove that the optimization direction corresponds to a synonymous rate-distortion-perception tradeoff and implement a progressive SIC codec that achieves competitive RD-P and perceptual performance against established PIC methods. This work provides a unified theoretical framework for semantic information in image coding and points to future enhancements using adversarial losses.

Abstract

Recent contributions of semantic information theory reveal the set-element relationship between semantic and syntactic information, represented as synonymous relationships. In this paper, we propose a synonymous variational inference (SVI) method based on this synonymity viewpoint to re-analyze the perceptual image compression problem. It takes perceptual similarity as a typical synonymous criterion to build an ideal synonymous set (Synset), and approximate the posterior of its latent synonymous representation with a parametric density by minimizing a partial semantic KL divergence. This analysis theoretically proves that the optimization direction of perception image compression follows a triple tradeoff that can cover the existing rate-distortion-perception schemes. Additionally, we introduce synonymous image compression (SIC), a new image compression scheme that corresponds to the analytical process of SVI, and implement a progressive SIC codec to fully leverage the model's capabilities. Experimental results demonstrate comparable rate-distortion-perception performance using a single progressive SIC codec, thus verifying the effectiveness of our proposed analysis method.

Paper Structure

This paper contains 28 sections, 4 theorems, 31 equations, 20 figures, 1 table.

Key Result

Lemma 3.2

When the source considers the existence of an ideal synset $\boldsymbol{\mathcal{X}}$ and the decoder places the reconstructed sample in a reconstructed synset $\tilde{\boldsymbol{\mathcal{X}}}$, the minimization of the expected negative log synonymous likelihood term in which $\lambda_d$ and $\lambda_p$ are the tradeoff factors for the expected distortion (typically expected means-squared error,

Figures (20)

  • Figure 1: An illustration of the optimization directions of synonymous image compression. By continuously minimizing the partial semantic KL divergence $D_{\text{KL},s}\left[q||p_{\tilde{\boldsymbol{y}}_s|\boldsymbol{\mathcal{X}}}\right]$ in latent space, the reconstructed synset $\hat{\boldsymbol{\mathcal{X}}}$ gradually approaches the ideal synset $\boldsymbol{\mathcal{X}}$ until complete overlap occurs. At that point, every sample $\hat{\boldsymbol{x}}_j \in \hat{\boldsymbol{\mathcal{X}}}$ is a "synonym" of the original image sample $\boldsymbol{x}$.
  • Figure 2: Left: Representation of the proposed encoder as a synonymous variational inference model, and corresponding decoder as a generative Bayesian model. The latent representation $\tilde{\boldsymbol{y}}$ is a merge of the synonymous representation $\tilde{\boldsymbol{y}}_s$ and the detailed representation $\tilde{\boldsymbol{y}}_\epsilon$, achieved through some form of merging or splicing. A fully factorized balle2016end or a hyperprior-like balle2018variationalminnen2018joint entropy model can be employed in the "Parametric Prior" item. An autoregressive minnen2018joint or a parallel he2021checkerboard context model can also be utilized in $\boldsymbol{\theta}_p$. These two types of methods can be used for accurate probability estimations of $\tilde{\boldsymbol{y}}_s$ or predictions for $\boldsymbol{\hat{y}}_{\epsilon}$. Right: Illustrations for the equivalent relationship of the "noisy" latent synset $\tilde{\boldsymbol{\mathcal{Y}}}$ and the ideal synset $\boldsymbol{\mathcal{X}}$.
  • Figure 3: Processing frameworks of SIC. (a): The general framework. (b): The progressive framework.
  • Figure 4: Comparisons of methods using DISTS on different datasets. Each point on the HiFiC and MS-ILLM performance curves is from a single model, while our entire performance curves are achieved by a single progressive SIC model.
  • Figure 5: Comparisons of our progressive SIC schemes with different sampling numbers in reconstructed $\hat{\boldsymbol{\mathcal{X}}}$ on different datasets.
  • ...and 15 more figures

Theorems & Definitions (8)

  • Definition 3.1
  • Lemma 3.2
  • Theorem 3.3
  • proof
  • Lemma 3.2
  • proof
  • Theorem 3.3
  • proof