INT-DTT+: Low-Complexity Data-Dependent Transforms for Video Coding
Samuel Fernández-Menduiña, Eduardo Pavez, Antonio Ortega, Tsung-Wei Huang, Thuong Nguyen Canh, Guan-Ming Su, Peng Yin
TL;DR
Data-dependent transforms offer improved coding efficiency but often lack fast implementations. The authors introduce $DTT^+$, a family of GBSTs derived from rank-one updates to base DTT graphs, and derive a low-complexity integer realization, $INT-DTT^+$, via progressive decomposition with a structured Cauchy transition. They learn the updates from data using a Cartesian-product graph model and optimize transforms within a rate-distortion-optimized transform (RDOT) framework in a VVC MDT setting. In intra-prediction experiments, they achieve more than $3\%$ BD-rate gains with complexity close to the integer DCT-2 and substantial memory reductions due to sparsified Cauchy transitions. Overall, the framework enables practical deployment of learned data-dependent transforms in modern codecs with meaningful gains.
Abstract
Discrete trigonometric transforms (DTTs), such as the DCT-2 and the DST-7, are widely used in video codecs for their balance between coding performance and computational efficiency. In contrast, data-dependent transforms, such as the Karhunen-Loève transform (KLT) and graph-based separable transforms (GBSTs), offer better energy compaction but lack symmetries that can be exploited to reduce computational complexity. This paper bridges this gap by introducing a general framework to design low-complexity data-dependent transforms. Our approach builds on DTT+, a family of GBSTs derived from rank-one updates of the DTT graphs, which can adapt to signal statistics while retaining a structure amenable to fast computation. We first propose a graph learning algorithm for DTT+ that estimates the rank-one updates for rows and column graphs jointly, capturing the statistical properties of the overall block. Then, we exploit the progressive structure of DTT+ to decompose the kernel into a base DTT and a structured Cauchy matrix. By leveraging low-complexity integer DTTs and sparsifying the Cauchy matrix, we construct an integer approximation to DTT+, termed INT-DTT+. This approximation significantly reduces both computational and memory complexities with respect to the separable KLT with minimal performance loss. We validate our approach in the context of mode-dependent transforms for the VVC standard, following a rate-distortion optimized transform (RDOT) design approach. Integrated into the explicit multiple transform selection (MTS) framework of VVC in a rate-distortion optimization setup, INT-DTT+ achieves more than 3% BD-rate savings over the VVC MTS baseline, with complexity comparable to the integer DCT-2 once the base DTT coefficients are available.
