Exploiting Latent Properties to Optimize Neural Codecs

Muhammet Balcilar; Bharath Bhushan Damodaran; Karam Naser; Franck Galpin; Pierre Hellier

Exploiting Latent Properties to Optimize Neural Codecs

Muhammet Balcilar, Bharath Bhushan Damodaran, Karam Naser, Franck Galpin, Pierre Hellier

TL;DR

This work presents two orthogonal, retraining-free enhancements for learned neural codecs: (i) replacing conventional scalar quantization with predefined uniform vector quantization using fixed space-tessellation grids (Hex-Quant/Oct-Quant) to exploit latent redundancy, and (ii) leveraging the entropy gradient available at the decoder as a proxy for the reconstruction-gradient via KKT conditions to perform Latent Shift after decoding. The proposed methods yield consistent bitrate savings of roughly $1$–$3\%$ across multiple image and video codecs and also improve traditional codecs slightly. The results show strong gains when combining both approaches, with the Latent Shift benefit correlating to gradient relationships between entropy and reconstruction terms. The paper also analyzes complexity, demonstrates robustness across datasets, and discusses practical deployment considerations, including extensions to traditional codecs like JVET ECM-10.0.

Abstract

End-to-end image and video codecs are becoming increasingly competitive, compared to traditional compression techniques that have been developed through decades of manual engineering efforts. These trainable codecs have many advantages over traditional techniques, such as their straightforward adaptation to perceptual distortion metrics and high performance in specific fields thanks to their learning ability. However, current state-of-the-art neural codecs do not fully exploit the benefits of vector quantization and the existence of the entropy gradient in decoding devices. In this paper, we propose to leverage these two properties (vector quantization and entropy gradient) to improve the performance of off-the-shelf codecs. Firstly, we demonstrate that using non-uniform scalar quantization cannot improve performance over uniform quantization. We thus suggest using predefined optimal uniform vector quantization to improve performance. Secondly, we show that the entropy gradient, available at the decoder, is correlated with the reconstruction error gradient, which is not available at the decoder. We therefore use the former as a proxy to enhance compression performance. Our experimental results show that these approaches save between 1 to 3% of the rate for the same quality across various pretrained methods. In addition, the entropy gradient based solution improves traditional codec performance significantly as well.

Exploiting Latent Properties to Optimize Neural Codecs

TL;DR

–

across multiple image and video codecs and also improve traditional codecs slightly. The results show strong gains when combining both approaches, with the Latent Shift benefit correlating to gradient relationships between entropy and reconstruction terms. The paper also analyzes complexity, demonstrates robustness across datasets, and discusses practical deployment considerations, including extensions to traditional codecs like JVET ECM-10.0.

Abstract

Paper Structure (20 sections, 3 theorems, 22 equations, 7 figures, 7 tables)

This paper contains 20 sections, 3 theorems, 22 equations, 7 figures, 7 tables.

Introduction
Problem statement and State of the Art
Uniform Vector Quantization
Space Tessellation Grids
Uniform VQ with Off-the-shelf Neural Codec
Forgotten Information: The Entropy Gradient
Experimental Results
Main Results
Complexity Analysis
Latent Shift versus Alternatives
Enhancing Traditional Codecs with Latent Shift
Latent Shift after Fine-tuning Solutions
Conclusion
Proof of Theorem 1
Advantage of Space Tessellation Grid on Quantization
...and 5 more sections

Key Result

Theorem 1

If a neural codec has an encoder block $g_a: \mathbb{R}^{n \times n \times 3} \rightarrow \mathbb{R}^{m \times m \times o}$, an decoder block $g_s: \mathbb{R}^{m \times m \times o} \rightarrow \mathbb{R}^{n \times n \times 3}$ and it requires a non-uniform SQ map for optimal rate-distortion performa

Figures (7)

Figure 1: Block diagram of the state-of-the art neural codecs. The five dark green blocks are the trainable blocks implemented by neural networks, while the binary patterns show the quantization and entropy encoding/decoding processes driven by certain entropy model's PMFs on main and side latents.
Figure 2: a) $f: y \rightarrow z$ transforms a non-uniform quantization map (grid borders $b_0,\dots b_n$ and grid centers $c_1,\dots c_n$) into a uniform map (centers are located on integer where borders are at the middle of two consecutive centers). b) Uniform SQ grids on 2D c) Optimal uniform VQ grids on 2D.
Figure 3: RD performances of different volume uniform SQ, regular hexagon Hex-Quant and truncated octahedron Oct-Quant grid. a) Uniform source is sampled from $U(-4,4)$ and zoom-in where the grid volume is unitary. b) Different Gaussian sources. c) RD plot of unitary volume grids for Gaussian sources.
Figure 4: Correlation between the gradients of the entropy and of the reconstruction error w.r.t the main latents.
Figure 5: BD-Rates of our proposals from baseline codecs for different quality. a) mbt2018mean image codec on Kodak test set b) SSF video codec on UVG test set. c) Correlation between improvement on reconstruction quality and correlation of gradients on Clic and Kodak datasets.
...and 2 more figures

Theorems & Definitions (8)

Theorem 1
Remark 1
Remark 2
Theorem 2
Corollary 2.1
proof
proof
proof

Exploiting Latent Properties to Optimize Neural Codecs

TL;DR

Abstract

Exploiting Latent Properties to Optimize Neural Codecs

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (8)