Table of Contents
Fetching ...

On Self-Adaptive Perception Loss Function for Sequential Lossy Compression

Sadaf Salehkalaibar, Buu Phan, Likun Cai, Joao Atz Dick, Wei Yu, Jun Chen, Ashish Khisti

TL;DR

This work introduces Self-Adaptive Perception Loss Function (PLF-SA) for causal, low-latency sequential lossy compression, adapting to the quality of previously reconstructed frames by modeling the joint distribution with past frames. The authors derive an information-theoretic rate-distortion-perception (RDP) framework for first-order Gauss-Markov sources, proving that jointly Gaussian reconstructions are optimal and showing RDP convergence with high rates. Through theoretical analysis and experiments on MovingMNIST and UVG, PLF-SA is shown to mitigate error permanence associated with PLF-JD and to better exploit temporal correlations than PLF-FMD, delivering improved perceptual quality (LPIPS) and temporal consistency, especially in low-rate regimes. The practical contribution combines a scale-space flow neural video coder with Wasserstein GAN-based perceptual optimization, demonstrating compelling performance gains and offering a principled approach to jointly optimize distortion and perception in sequential video compression.

Abstract

We consider causal, low-latency, sequential lossy compression, with mean squared-error (MSE) as the distortion loss, and a perception loss function (PLF) to enhance the realism of reconstructions. As the main contribution, we propose and analyze a new PLF that considers the joint distribution between the current source frame and the previous reconstructions. We establish the theoretical rate-distortion-perception function for first-order Markov sources and analyze the Gaussian model in detail. From a qualitative perspective, the proposed metric can simultaneously avoid the error-permanence phenomenon and also better exploit the temporal correlation between high-quality reconstructions. The proposed metric is referred to as self-adaptive perception loss function (PLF-SA), as its behavior adapts to the quality of reconstructed frames. We provide a detailed comparison of the proposed perception loss function with previous approaches through both information theoretic analysis as well as experiments involving moving MNIST and UVG datasets.

On Self-Adaptive Perception Loss Function for Sequential Lossy Compression

TL;DR

This work introduces Self-Adaptive Perception Loss Function (PLF-SA) for causal, low-latency sequential lossy compression, adapting to the quality of previously reconstructed frames by modeling the joint distribution with past frames. The authors derive an information-theoretic rate-distortion-perception (RDP) framework for first-order Gauss-Markov sources, proving that jointly Gaussian reconstructions are optimal and showing RDP convergence with high rates. Through theoretical analysis and experiments on MovingMNIST and UVG, PLF-SA is shown to mitigate error permanence associated with PLF-JD and to better exploit temporal correlations than PLF-FMD, delivering improved perceptual quality (LPIPS) and temporal consistency, especially in low-rate regimes. The practical contribution combines a scale-space flow neural video coder with Wasserstein GAN-based perceptual optimization, demonstrating compelling performance gains and offering a principled approach to jointly optimize distortion and perception in sequential video compression.

Abstract

We consider causal, low-latency, sequential lossy compression, with mean squared-error (MSE) as the distortion loss, and a perception loss function (PLF) to enhance the realism of reconstructions. As the main contribution, we propose and analyze a new PLF that considers the joint distribution between the current source frame and the previous reconstructions. We establish the theoretical rate-distortion-perception function for first-order Markov sources and analyze the Gaussian model in detail. From a qualitative perspective, the proposed metric can simultaneously avoid the error-permanence phenomenon and also better exploit the temporal correlation between high-quality reconstructions. The proposed metric is referred to as self-adaptive perception loss function (PLF-SA), as its behavior adapts to the quality of reconstructed frames. We provide a detailed comparison of the proposed perception loss function with previous approaches through both information theoretic analysis as well as experiments involving moving MNIST and UVG datasets.

Paper Structure

This paper contains 23 sections, 10 theorems, 6 equations, 9 figures, 5 tables.

Key Result

Theorem 3.2

For first-order Markov sources, a given $(\mathsf{D},\mathsf{P})$ and $\mathsf{R}\in \mathcal{R}(\mathsf{D},\mathsf{P})$, we have

Figures (9)

  • Figure 1: (a) Outputs for MovingMNIST with the first frame compressed at a low bitrate $R_1 = 12$ bits. PLF-SA and PLF-FMD recover from previous errors, while PLF-JD and DCVC-HEM exhibit error permanence. (b) Outputs for UVG with the first frame compressed at a low bitrate $R_1 = 0.144$ bpp. PLF-SA and PLF-FMD maintain color tone, whereas PLF-JD propagates color tone errors. DCVC-HEM struggles to reconstruct details like eye pupils, while PLF models perform better. (c) Outputs for MovingMNIST with the first frame compressed at a high bitrate $R_1 = \infty$ bits. PLF-FMD produces reconstruction error without maintaining the temporal correlation. PLF-JD propagates the trajectory error while PLF-SA rectifies the error preserves the temporal correlation across different frames.
  • Figure 2: System model for a sequential lossy compression.
  • Figure 3: The reconstruction results on the MovingMNIST dataset when the first frame is compressed at a low rate $R_1=12$ bits. Similar to the Guass-Markov case presented in Section \ref{['low_rate']}, both PLF-SA and PLF-FMD demonstrate resilience to prior errors (digit contour errors) by incorporating new information from $X_2$ and $X_3$, while PLF-JD suffers from error permanence phenomenon as it tends to ignore new information. DCVC-HEM exhibits a comparable tendency for error permanence.
  • Figure 4: The reconstruction results on the UVG dataset when the first frame is compressed at a low rate $R_1=0.144$ bpp. $\hat{X_1}$ is shared across all models. Similar to the Gauss-Markov case and MovingMNIST results, PLF-SA and PLF-FMD exhibit robustness to first-frame errors (color tone mismatches) while PLF-JD suffers from error permanence.
  • Figure 5: Reconstruction results on the MovingMNIST dataset for $\infty$-$R_2$-$R_3$ with $R_2 = 2$ bits and $R_3 = 16$ bits. Colored digits highlight trajectory across frames. (a) With small correlation coefficient $0 < \rho \ll \sqrt{\epsilon}$, PLF-FMD preserves direction but loses temporal consistency in digits' contour. PLF-JD and PLF-SA fail to identify the direction in the second frame, but PLF-SA rectifies the error in the third frame. (b) With large correlation coefficient $\sqrt{\epsilon}\ll \rho < 1$, PLF-FMD tends to replicate the first frame without capturing motion effectively, while PLF-JD and PLF-SA show greater generative diversity.
  • ...and 4 more figures

Theorems & Definitions (13)

  • Definition 2.1: Operational RDP region
  • Definition 3.1: Information RDP Region
  • Theorem 3.2
  • Theorem 3.3
  • Definition 1.1: Information RDP Region
  • Proposition 1.2
  • Lemma 1.3
  • Proposition 2.1
  • Theorem 3.1
  • Proposition 3.2
  • ...and 3 more