NeRFPlayer: A Streamable Dynamic Scene Representation with Decomposed Neural Radiance Fields
Liangchen Song, Anpei Chen, Zhong Li, Zhang Chen, Lele Chen, Junsong Yuan, Yi Xu, Andreas Geiger
TL;DR
NeRFPlayer tackles fast, free-viewpoint rendering of dynamic scenes from limited RGB input by decomposing spatiotemporal space into static, deforming, and new regions, each represented by separate neural fields. It couples this temporal decomposition with a streamable hybrid representation that uses a sliding window over feature channels to enable compact, frame-by-frame streaming and interpolation. The method is trained with reconstruction loss plus a global parsimony regularization and demonstrates competitive rendering quality with substantially faster reconstruction and rendering on both multi-camera and single-camera datasets; ablations confirm the value of the three-region decomposition and the streaming scheme. This work enables efficient, interactive dynamic scene exploration in VR using modest capture setups and streaming bandwidth, with potential for broader dynamic-NeRF applications.
Abstract
Visually exploring in a real-world 4D spatiotemporal space freely in VR has been a long-term quest. The task is especially appealing when only a few or even single RGB cameras are used for capturing the dynamic scene. To this end, we present an efficient framework capable of fast reconstruction, compact modeling, and streamable rendering. First, we propose to decompose the 4D spatiotemporal space according to temporal characteristics. Points in the 4D space are associated with probabilities of belonging to three categories: static, deforming, and new areas. Each area is represented and regularized by a separate neural field. Second, we propose a hybrid representations based feature streaming scheme for efficiently modeling the neural fields. Our approach, coined NeRFPlayer, is evaluated on dynamic scenes captured by single hand-held cameras and multi-camera arrays, achieving comparable or superior rendering performance in terms of quality and speed comparable to recent state-of-the-art methods, achieving reconstruction in 10 seconds per frame and interactive rendering.
