Tree-NeRV: A Tree-Structured Neural Representation for Efficient Non-Uniform Video Encoding

Jiancheng Zhao; Yifan Zhan; Qingtian Zhu; Mingze Ma; Muyao Niu; Zunian Wan; Xiang Ji; Yinqiang Zheng

Tree-NeRV: A Tree-Structured Neural Representation for Efficient Non-Uniform Video Encoding

Jiancheng Zhao, Yifan Zhan, Qingtian Zhu, Mingze Ma, Muyao Niu, Zunian Wan, Xiang Ji, Yinqiang Zheng

TL;DR

Tree-NeRV tackles the inefficiency of uniform temporal sampling in implicit neural video representations by introducing a Binary Search Tree (BST)-based tree-structured feature grid that enables non-uniform, adaptive sampling along the video timeline. An optimization-driven training strategy grows the tree to allocate more samples to high-variation regions, while AVL balancing maintains efficient queries. The method couples a BST-based time embedding with cascaded NeRV blocks, achieving state-of-the-art reconstruction quality and competitive RD performance across standard datasets, along with faster encoding/decoding relative to several baselines. Empirically, Tree-NeRV delivers notable PSNR gains, aligns sampling with temporal dynamics, and maintains practical encoding/decoding efficiency, with potential for further improvements via pruning strategies in future work.

Abstract

Implicit Neural Representations for Videos (NeRV) have emerged as a powerful paradigm for video representation, enabling direct mappings from frame indices to video frames. However, existing NeRV-based methods do not fully exploit temporal redundancy, as they rely on uniform sampling along the temporal axis, leading to suboptimal rate-distortion (RD) performance. To address this limitation, we propose Tree-NeRV, a novel tree-structured feature representation for efficient and adaptive video encoding. Unlike conventional approaches, Tree-NeRV organizes feature representations within a Binary Search Tree (BST), enabling non-uniform sampling along the temporal axis. Additionally, we introduce an optimization-driven sampling strategy, dynamically allocating higher sampling density to regions with greater temporal variation. Extensive experiments demonstrate that Tree-NeRV achieves superior compression efficiency and reconstruction quality, outperforming prior uniform sampling-based methods. Code will be released.

Tree-NeRV: A Tree-Structured Neural Representation for Efficient Non-Uniform Video Encoding

TL;DR

Abstract

Tree-NeRV: A Tree-Structured Neural Representation for Efficient Non-Uniform Video Encoding

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (14)