Table of Contents
Fetching ...

NeRFPlayer: A Streamable Dynamic Scene Representation with Decomposed Neural Radiance Fields

Liangchen Song, Anpei Chen, Zhong Li, Zhang Chen, Lele Chen, Junsong Yuan, Yi Xu, Andreas Geiger

TL;DR

NeRFPlayer tackles fast, free-viewpoint rendering of dynamic scenes from limited RGB input by decomposing spatiotemporal space into static, deforming, and new regions, each represented by separate neural fields. It couples this temporal decomposition with a streamable hybrid representation that uses a sliding window over feature channels to enable compact, frame-by-frame streaming and interpolation. The method is trained with reconstruction loss plus a global parsimony regularization and demonstrates competitive rendering quality with substantially faster reconstruction and rendering on both multi-camera and single-camera datasets; ablations confirm the value of the three-region decomposition and the streaming scheme. This work enables efficient, interactive dynamic scene exploration in VR using modest capture setups and streaming bandwidth, with potential for broader dynamic-NeRF applications.

Abstract

Visually exploring in a real-world 4D spatiotemporal space freely in VR has been a long-term quest. The task is especially appealing when only a few or even single RGB cameras are used for capturing the dynamic scene. To this end, we present an efficient framework capable of fast reconstruction, compact modeling, and streamable rendering. First, we propose to decompose the 4D spatiotemporal space according to temporal characteristics. Points in the 4D space are associated with probabilities of belonging to three categories: static, deforming, and new areas. Each area is represented and regularized by a separate neural field. Second, we propose a hybrid representations based feature streaming scheme for efficiently modeling the neural fields. Our approach, coined NeRFPlayer, is evaluated on dynamic scenes captured by single hand-held cameras and multi-camera arrays, achieving comparable or superior rendering performance in terms of quality and speed comparable to recent state-of-the-art methods, achieving reconstruction in 10 seconds per frame and interactive rendering.

NeRFPlayer: A Streamable Dynamic Scene Representation with Decomposed Neural Radiance Fields

TL;DR

NeRFPlayer tackles fast, free-viewpoint rendering of dynamic scenes from limited RGB input by decomposing spatiotemporal space into static, deforming, and new regions, each represented by separate neural fields. It couples this temporal decomposition with a streamable hybrid representation that uses a sliding window over feature channels to enable compact, frame-by-frame streaming and interpolation. The method is trained with reconstruction loss plus a global parsimony regularization and demonstrates competitive rendering quality with substantially faster reconstruction and rendering on both multi-camera and single-camera datasets; ablations confirm the value of the three-region decomposition and the streaming scheme. This work enables efficient, interactive dynamic scene exploration in VR using modest capture setups and streaming bandwidth, with potential for broader dynamic-NeRF applications.

Abstract

Visually exploring in a real-world 4D spatiotemporal space freely in VR has been a long-term quest. The task is especially appealing when only a few or even single RGB cameras are used for capturing the dynamic scene. To this end, we present an efficient framework capable of fast reconstruction, compact modeling, and streamable rendering. First, we propose to decompose the 4D spatiotemporal space according to temporal characteristics. Points in the 4D space are associated with probabilities of belonging to three categories: static, deforming, and new areas. Each area is represented and regularized by a separate neural field. Second, we propose a hybrid representations based feature streaming scheme for efficiently modeling the neural fields. Our approach, coined NeRFPlayer, is evaluated on dynamic scenes captured by single hand-held cameras and multi-camera arrays, achieving comparable or superior rendering performance in terms of quality and speed comparable to recent state-of-the-art methods, achieving reconstruction in 10 seconds per frame and interactive rendering.
Paper Structure (27 sections, 4 equations, 13 figures, 6 tables)

This paper contains 27 sections, 4 equations, 13 figures, 6 tables.

Figures (13)

  • Figure 1: (a) Our framework takes as input the RGB images captured from a camera array or a single moving camera. (b) After offline optimization, our framework can render a novel view and perform temporal interpolation interactively. (c) Our framework is highly configurable. Adopting TensoRF-CP tensorf voxel representation in our framework results in low bitrate streaming of high-quality rendering.
  • Figure 2: First row: We categorize the areas in a dynamic scene into three groups: deforming, new and static areas. Second row: Visualization of the self-supervised decomposition obtained from our framework. Red and blue areas indicate estimated high and low probabilities of a category.
  • Figure 3: A toy example of 2D dynamic sequence interpolation. The first row shows the 2D input sequence with missing frames. Without modeling deformation $d(\cdot)$, the second row fails to interpolate the rigid motion of '2022'. Without modeling newness $n(\cdot)$, the third row fails to interpolate the gradually appearing effect. Full decomposition handles both phenomena well.
  • Figure 4: The proposed streamable hybrid representation. A time-dependent sliding window is adopted for streaming the feature channels.
  • Figure 5: An overview of our framework. The newness field and decomposition field are implemented with the channel streaming technique proposed in \ref{['fig:stream']}. A small MLP is adopted in the decomposition field for predicting the probabilities. The stationary field consists of a static feature volume for modeling time-invariant areas and a tiny MLP with time $t$ input for modeling low-frequency time-varying appearance. The deformation field and radiance field are two small MLPs.
  • ...and 8 more figures