Exploiting Latent Properties to Optimize Neural Codecs
Muhammet Balcilar, Bharath Bhushan Damodaran, Karam Naser, Franck Galpin, Pierre Hellier
TL;DR
This work presents two orthogonal, retraining-free enhancements for learned neural codecs: (i) replacing conventional scalar quantization with predefined uniform vector quantization using fixed space-tessellation grids (Hex-Quant/Oct-Quant) to exploit latent redundancy, and (ii) leveraging the entropy gradient available at the decoder as a proxy for the reconstruction-gradient via KKT conditions to perform Latent Shift after decoding. The proposed methods yield consistent bitrate savings of roughly $1$–$3\%$ across multiple image and video codecs and also improve traditional codecs slightly. The results show strong gains when combining both approaches, with the Latent Shift benefit correlating to gradient relationships between entropy and reconstruction terms. The paper also analyzes complexity, demonstrates robustness across datasets, and discusses practical deployment considerations, including extensions to traditional codecs like JVET ECM-10.0.
Abstract
End-to-end image and video codecs are becoming increasingly competitive, compared to traditional compression techniques that have been developed through decades of manual engineering efforts. These trainable codecs have many advantages over traditional techniques, such as their straightforward adaptation to perceptual distortion metrics and high performance in specific fields thanks to their learning ability. However, current state-of-the-art neural codecs do not fully exploit the benefits of vector quantization and the existence of the entropy gradient in decoding devices. In this paper, we propose to leverage these two properties (vector quantization and entropy gradient) to improve the performance of off-the-shelf codecs. Firstly, we demonstrate that using non-uniform scalar quantization cannot improve performance over uniform quantization. We thus suggest using predefined optimal uniform vector quantization to improve performance. Secondly, we show that the entropy gradient, available at the decoder, is correlated with the reconstruction error gradient, which is not available at the decoder. We therefore use the former as a proxy to enhance compression performance. Our experimental results show that these approaches save between 1 to 3% of the rate for the same quality across various pretrained methods. In addition, the entropy gradient based solution improves traditional codec performance significantly as well.
