Table of Contents
Fetching ...

INT-DTT+: Low-Complexity Data-Dependent Transforms for Video Coding

Samuel Fernández-Menduiña, Eduardo Pavez, Antonio Ortega, Tsung-Wei Huang, Thuong Nguyen Canh, Guan-Ming Su, Peng Yin

TL;DR

Data-dependent transforms offer improved coding efficiency but often lack fast implementations. The authors introduce $DTT^+$, a family of GBSTs derived from rank-one updates to base DTT graphs, and derive a low-complexity integer realization, $INT-DTT^+$, via progressive decomposition with a structured Cauchy transition. They learn the updates from data using a Cartesian-product graph model and optimize transforms within a rate-distortion-optimized transform (RDOT) framework in a VVC MDT setting. In intra-prediction experiments, they achieve more than $3\%$ BD-rate gains with complexity close to the integer DCT-2 and substantial memory reductions due to sparsified Cauchy transitions. Overall, the framework enables practical deployment of learned data-dependent transforms in modern codecs with meaningful gains.

Abstract

Discrete trigonometric transforms (DTTs), such as the DCT-2 and the DST-7, are widely used in video codecs for their balance between coding performance and computational efficiency. In contrast, data-dependent transforms, such as the Karhunen-Loève transform (KLT) and graph-based separable transforms (GBSTs), offer better energy compaction but lack symmetries that can be exploited to reduce computational complexity. This paper bridges this gap by introducing a general framework to design low-complexity data-dependent transforms. Our approach builds on DTT+, a family of GBSTs derived from rank-one updates of the DTT graphs, which can adapt to signal statistics while retaining a structure amenable to fast computation. We first propose a graph learning algorithm for DTT+ that estimates the rank-one updates for rows and column graphs jointly, capturing the statistical properties of the overall block. Then, we exploit the progressive structure of DTT+ to decompose the kernel into a base DTT and a structured Cauchy matrix. By leveraging low-complexity integer DTTs and sparsifying the Cauchy matrix, we construct an integer approximation to DTT+, termed INT-DTT+. This approximation significantly reduces both computational and memory complexities with respect to the separable KLT with minimal performance loss. We validate our approach in the context of mode-dependent transforms for the VVC standard, following a rate-distortion optimized transform (RDOT) design approach. Integrated into the explicit multiple transform selection (MTS) framework of VVC in a rate-distortion optimization setup, INT-DTT+ achieves more than 3% BD-rate savings over the VVC MTS baseline, with complexity comparable to the integer DCT-2 once the base DTT coefficients are available.

INT-DTT+: Low-Complexity Data-Dependent Transforms for Video Coding

TL;DR

Data-dependent transforms offer improved coding efficiency but often lack fast implementations. The authors introduce , a family of GBSTs derived from rank-one updates to base DTT graphs, and derive a low-complexity integer realization, , via progressive decomposition with a structured Cauchy transition. They learn the updates from data using a Cartesian-product graph model and optimize transforms within a rate-distortion-optimized transform (RDOT) framework in a VVC MDT setting. In intra-prediction experiments, they achieve more than BD-rate gains with complexity close to the integer DCT-2 and substantial memory reductions due to sparsified Cauchy transitions. Overall, the framework enables practical deployment of learned data-dependent transforms in modern codecs with meaningful gains.

Abstract

Discrete trigonometric transforms (DTTs), such as the DCT-2 and the DST-7, are widely used in video codecs for their balance between coding performance and computational efficiency. In contrast, data-dependent transforms, such as the Karhunen-Loève transform (KLT) and graph-based separable transforms (GBSTs), offer better energy compaction but lack symmetries that can be exploited to reduce computational complexity. This paper bridges this gap by introducing a general framework to design low-complexity data-dependent transforms. Our approach builds on DTT+, a family of GBSTs derived from rank-one updates of the DTT graphs, which can adapt to signal statistics while retaining a structure amenable to fast computation. We first propose a graph learning algorithm for DTT+ that estimates the rank-one updates for rows and column graphs jointly, capturing the statistical properties of the overall block. Then, we exploit the progressive structure of DTT+ to decompose the kernel into a base DTT and a structured Cauchy matrix. By leveraging low-complexity integer DTTs and sparsifying the Cauchy matrix, we construct an integer approximation to DTT+, termed INT-DTT+. This approximation significantly reduces both computational and memory complexities with respect to the separable KLT with minimal performance loss. We validate our approach in the context of mode-dependent transforms for the VVC standard, following a rate-distortion optimized transform (RDOT) design approach. Integrated into the explicit multiple transform selection (MTS) framework of VVC in a rate-distortion optimization setup, INT-DTT+ achieves more than 3% BD-rate savings over the VVC MTS baseline, with complexity comparable to the integer DCT-2 once the base DTT coefficients are available.

Paper Structure

This paper contains 6 sections, 2 theorems, 12 equations, 5 figures, 4 tables, 2 algorithms.

Key Result

Proposition 2.1

Given $\hbox{$\bf L$}= {\hbox{$\bf U$}}\mathrm{diag}(\pmb{\lambda})\hbox{$\bf U$}^\top$ and $\tilde{\hbox{$\bf L$}}(\alpha, \beta, i) = \tilde{\hbox{$\bf U$}}\mathrm{diag}(\tilde{\pmb{\lambda}})\tilde{\hbox{$\bf U$}}^\top$, we can write: where $\hbox{$\bf C$}(\tilde{\pmb{\lambda}}, \beta\pmb{\lambda})$ is a Cauchy matrix fernandez-menduina2025fast such that $C_{ij} = 1/(\tilde{\lambda}_i - \beta

Figures (5)

  • Figure 1: (a) Path graph (DCT-2), (b) path graph with unit self-loop (DST-7), and (c) DTT+ graph with parameters $(\alpha, \beta)$.
  • Figure 2: Graph-based model for prediction residuals as the Cartesian product of two DTT+ graphs.
  • Figure 3: (a) Transition kernel between DST-7 and the DTT+ learned for the planar mode in VVC, (b) results after quantization with step $16$, and (c) its integer version, factoring out divisions by $16$. Quantization yields an integer and sparse approximation to the original kernel.
  • Figure 4: Learned weights for rows (R) and columns (C), with $8\times 8$, $16\times 16$, and $32\times 32$ blocks. We observe spatial consistency.
  • Figure 5: Operation count for the forward integer transforms. For INT-DTT+, we assume the coefficients of the base DTT are available, i.e., RDO scenarios, where the DTTs are almost always computed, and show median, maximum, and minimum across all modes. INT-DTT+ compares to the integer DCT-2 in complexity.

Theorems & Definitions (2)

  • Proposition 2.1: Progressive decomposition fernandez-menduina2025fast
  • Proposition 2.2: Eigenvalue interleaving bunch1978rank