Table of Contents
Fetching ...

Exploiting Latent Properties to Optimize Neural Codecs

Muhammet Balcilar, Bharath Bhushan Damodaran, Karam Naser, Franck Galpin, Pierre Hellier

TL;DR

This work presents two orthogonal, retraining-free enhancements for learned neural codecs: (i) replacing conventional scalar quantization with predefined uniform vector quantization using fixed space-tessellation grids (Hex-Quant/Oct-Quant) to exploit latent redundancy, and (ii) leveraging the entropy gradient available at the decoder as a proxy for the reconstruction-gradient via KKT conditions to perform Latent Shift after decoding. The proposed methods yield consistent bitrate savings of roughly $1$–$3\%$ across multiple image and video codecs and also improve traditional codecs slightly. The results show strong gains when combining both approaches, with the Latent Shift benefit correlating to gradient relationships between entropy and reconstruction terms. The paper also analyzes complexity, demonstrates robustness across datasets, and discusses practical deployment considerations, including extensions to traditional codecs like JVET ECM-10.0.

Abstract

End-to-end image and video codecs are becoming increasingly competitive, compared to traditional compression techniques that have been developed through decades of manual engineering efforts. These trainable codecs have many advantages over traditional techniques, such as their straightforward adaptation to perceptual distortion metrics and high performance in specific fields thanks to their learning ability. However, current state-of-the-art neural codecs do not fully exploit the benefits of vector quantization and the existence of the entropy gradient in decoding devices. In this paper, we propose to leverage these two properties (vector quantization and entropy gradient) to improve the performance of off-the-shelf codecs. Firstly, we demonstrate that using non-uniform scalar quantization cannot improve performance over uniform quantization. We thus suggest using predefined optimal uniform vector quantization to improve performance. Secondly, we show that the entropy gradient, available at the decoder, is correlated with the reconstruction error gradient, which is not available at the decoder. We therefore use the former as a proxy to enhance compression performance. Our experimental results show that these approaches save between 1 to 3% of the rate for the same quality across various pretrained methods. In addition, the entropy gradient based solution improves traditional codec performance significantly as well.

Exploiting Latent Properties to Optimize Neural Codecs

TL;DR

This work presents two orthogonal, retraining-free enhancements for learned neural codecs: (i) replacing conventional scalar quantization with predefined uniform vector quantization using fixed space-tessellation grids (Hex-Quant/Oct-Quant) to exploit latent redundancy, and (ii) leveraging the entropy gradient available at the decoder as a proxy for the reconstruction-gradient via KKT conditions to perform Latent Shift after decoding. The proposed methods yield consistent bitrate savings of roughly across multiple image and video codecs and also improve traditional codecs slightly. The results show strong gains when combining both approaches, with the Latent Shift benefit correlating to gradient relationships between entropy and reconstruction terms. The paper also analyzes complexity, demonstrates robustness across datasets, and discusses practical deployment considerations, including extensions to traditional codecs like JVET ECM-10.0.

Abstract

End-to-end image and video codecs are becoming increasingly competitive, compared to traditional compression techniques that have been developed through decades of manual engineering efforts. These trainable codecs have many advantages over traditional techniques, such as their straightforward adaptation to perceptual distortion metrics and high performance in specific fields thanks to their learning ability. However, current state-of-the-art neural codecs do not fully exploit the benefits of vector quantization and the existence of the entropy gradient in decoding devices. In this paper, we propose to leverage these two properties (vector quantization and entropy gradient) to improve the performance of off-the-shelf codecs. Firstly, we demonstrate that using non-uniform scalar quantization cannot improve performance over uniform quantization. We thus suggest using predefined optimal uniform vector quantization to improve performance. Secondly, we show that the entropy gradient, available at the decoder, is correlated with the reconstruction error gradient, which is not available at the decoder. We therefore use the former as a proxy to enhance compression performance. Our experimental results show that these approaches save between 1 to 3% of the rate for the same quality across various pretrained methods. In addition, the entropy gradient based solution improves traditional codec performance significantly as well.
Paper Structure (20 sections, 3 theorems, 22 equations, 7 figures, 7 tables)

This paper contains 20 sections, 3 theorems, 22 equations, 7 figures, 7 tables.

Key Result

Theorem 1

If a neural codec has an encoder block $g_a: \mathbb{R}^{n \times n \times 3} \rightarrow \mathbb{R}^{m \times m \times o}$, an decoder block $g_s: \mathbb{R}^{m \times m \times o} \rightarrow \mathbb{R}^{n \times n \times 3}$ and it requires a non-uniform SQ map for optimal rate-distortion performa

Figures (7)

  • Figure 1: Block diagram of the state-of-the art neural codecs. The five dark green blocks are the trainable blocks implemented by neural networks, while the binary patterns show the quantization and entropy encoding/decoding processes driven by certain entropy model's PMFs on main and side latents.
  • Figure 2: a) $f: y \rightarrow z$ transforms a non-uniform quantization map (grid borders $b_0,\dots b_n$ and grid centers $c_1,\dots c_n$) into a uniform map (centers are located on integer where borders are at the middle of two consecutive centers). b) Uniform SQ grids on 2D c) Optimal uniform VQ grids on 2D.
  • Figure 3: RD performances of different volume uniform SQ, regular hexagon Hex-Quant and truncated octahedron Oct-Quant grid. a) Uniform source is sampled from $U(-4,4)$ and zoom-in where the grid volume is unitary. b) Different Gaussian sources. c) RD plot of unitary volume grids for Gaussian sources.
  • Figure 4: Correlation between the gradients of the entropy and of the reconstruction error w.r.t the main latents.
  • Figure 5: BD-Rates of our proposals from baseline codecs for different quality. a) mbt2018mean image codec on Kodak test set b) SSF video codec on UVG test set. c) Correlation between improvement on reconstruction quality and correlation of gradients on Clic and Kodak datasets.
  • ...and 2 more figures

Theorems & Definitions (8)

  • Theorem 1
  • Remark 1
  • Remark 2
  • Theorem 2
  • Corollary 2.1
  • proof
  • proof
  • proof