On Self-Adaptive Perception Loss Function for Sequential Lossy Compression
Sadaf Salehkalaibar, Buu Phan, Likun Cai, Joao Atz Dick, Wei Yu, Jun Chen, Ashish Khisti
TL;DR
This work introduces Self-Adaptive Perception Loss Function (PLF-SA) for causal, low-latency sequential lossy compression, adapting to the quality of previously reconstructed frames by modeling the joint distribution with past frames. The authors derive an information-theoretic rate-distortion-perception (RDP) framework for first-order Gauss-Markov sources, proving that jointly Gaussian reconstructions are optimal and showing RDP convergence with high rates. Through theoretical analysis and experiments on MovingMNIST and UVG, PLF-SA is shown to mitigate error permanence associated with PLF-JD and to better exploit temporal correlations than PLF-FMD, delivering improved perceptual quality (LPIPS) and temporal consistency, especially in low-rate regimes. The practical contribution combines a scale-space flow neural video coder with Wasserstein GAN-based perceptual optimization, demonstrating compelling performance gains and offering a principled approach to jointly optimize distortion and perception in sequential video compression.
Abstract
We consider causal, low-latency, sequential lossy compression, with mean squared-error (MSE) as the distortion loss, and a perception loss function (PLF) to enhance the realism of reconstructions. As the main contribution, we propose and analyze a new PLF that considers the joint distribution between the current source frame and the previous reconstructions. We establish the theoretical rate-distortion-perception function for first-order Markov sources and analyze the Gaussian model in detail. From a qualitative perspective, the proposed metric can simultaneously avoid the error-permanence phenomenon and also better exploit the temporal correlation between high-quality reconstructions. The proposed metric is referred to as self-adaptive perception loss function (PLF-SA), as its behavior adapts to the quality of reconstructed frames. We provide a detailed comparison of the proposed perception loss function with previous approaches through both information theoretic analysis as well as experiments involving moving MNIST and UVG datasets.
