Table of Contents
Fetching ...

Tensor4D : Efficient Neural 4D Decomposition for High-fidelity Dynamic Reconstruction and Rendering

Ruizhi Shao, Zerong Zheng, Hanzhang Tu, Boning Liu, Hongwen Zhang, Yebin Liu

TL;DR

Tensor4D introduces a memory-efficient hierarchical tri-projection to represent dynamic 4D fields using nine 2D feature planes, enabling high-fidelity reconstruction from sparse-view or monocular inputs. The method combines a coarse-to-fine optimization with explicit feature grids to accelerate training while preserving detail, applicable to time-conditioned radiance fields (NeRF-T) and 4D flow fields (D-NeRF). Extensive experiments on synthetic and real-world sequences show state-of-the-art rendering quality and robust reconstruction under challenging capture setups, with reduced training time and memory usage. This work advances dynamic scene capture and telepresence by offering a scalable, explicit-structure alternative to fully implicit 4D NeRF models.

Abstract

We present Tensor4D, an efficient yet effective approach to dynamic scene modeling. The key of our solution is an efficient 4D tensor decomposition method so that the dynamic scene can be directly represented as a 4D spatio-temporal tensor. To tackle the accompanying memory issue, we decompose the 4D tensor hierarchically by projecting it first into three time-aware volumes and then nine compact feature planes. In this way, spatial information over time can be simultaneously captured in a compact and memory-efficient manner. When applying Tensor4D for dynamic scene reconstruction and rendering, we further factorize the 4D fields to different scales in the sense that structural motions and dynamic detailed changes can be learned from coarse to fine. The effectiveness of our method is validated on both synthetic and real-world scenes. Extensive experiments show that our method is able to achieve high-quality dynamic reconstruction and rendering from sparse-view camera rigs or even a monocular camera. The code and dataset will be released at https://liuyebin.com/tensor4d/tensor4d.html.

Tensor4D : Efficient Neural 4D Decomposition for High-fidelity Dynamic Reconstruction and Rendering

TL;DR

Tensor4D introduces a memory-efficient hierarchical tri-projection to represent dynamic 4D fields using nine 2D feature planes, enabling high-fidelity reconstruction from sparse-view or monocular inputs. The method combines a coarse-to-fine optimization with explicit feature grids to accelerate training while preserving detail, applicable to time-conditioned radiance fields (NeRF-T) and 4D flow fields (D-NeRF). Extensive experiments on synthetic and real-world sequences show state-of-the-art rendering quality and robust reconstruction under challenging capture setups, with reduced training time and memory usage. This work advances dynamic scene capture and telepresence by offering a scalable, explicit-structure alternative to fully implicit 4D NeRF models.

Abstract

We present Tensor4D, an efficient yet effective approach to dynamic scene modeling. The key of our solution is an efficient 4D tensor decomposition method so that the dynamic scene can be directly represented as a 4D spatio-temporal tensor. To tackle the accompanying memory issue, we decompose the 4D tensor hierarchically by projecting it first into three time-aware volumes and then nine compact feature planes. In this way, spatial information over time can be simultaneously captured in a compact and memory-efficient manner. When applying Tensor4D for dynamic scene reconstruction and rendering, we further factorize the 4D fields to different scales in the sense that structural motions and dynamic detailed changes can be learned from coarse to fine. The effectiveness of our method is validated on both synthetic and real-world scenes. Extensive experiments show that our method is able to achieve high-quality dynamic reconstruction and rendering from sparse-view camera rigs or even a monocular camera. The code and dataset will be released at https://liuyebin.com/tensor4d/tensor4d.html.
Paper Structure (14 sections, 18 equations, 6 figures, 5 tables)

This paper contains 14 sections, 18 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Given 4 sparse static RGB camera views of a dynamic scene (a), our proposed Tensor4D decomposition enables multiview reconstruction to achieve fine-grained geometry reconstruction even on human fingers (b) and temporal-consistent novel view synthesis on a 3D holographic display (c,d,e). The 4 cameras are settled on four conners of the display. The proposed method demonstrates low-cost, portable and highly immersive telepresence experience.
  • Figure 2: Illustration of our hierarchical tri-projection decomposition method. For a neural 4D field $f(x, y, z, t)$, we first decompose the 3D space part from 4D spatio-temporal tensor into three time-aware volumes, which are then further projected onto nine 2D planes.
  • Figure 3: The framework of Tensor4D for multi-view and monocular reconstruction.a). Tensor4D for multi-view reconstruction. The 4D NeRF-T fields are separately factorized by the LR and HR feature planes. b). Tensor4D for monocular reconstruction. The 4D flow fields are factorized by the LR feature plane for better disentanglement of shape and motion. The 3D canonical representation is factorized by three LR and HR feature planes.
  • Figure 4: Example results of our method. Space and time novel view rendering results from sparse-view fixed cameras. The top three results are from four front view cameras and the bottom is from 12 circular cameras.
  • Figure 5: Comparison on monocular synthetic dataset against D-NeRF D-Nerf2021-nfds54 and TiNeuVox TiNeuVox2022-na13.
  • ...and 1 more figures