Table of Contents
Fetching ...

Streaming Radiance Fields for 3D Video Synthesis

Lingzhi Li, Zhen Shen, Zhongshu Wang, Li Shen, Ping Tan

TL;DR

The paper tackles online 3D video synthesis for dynamic scenes by introducing StreamRF, an incremental, explicit-grid radiance-field framework. It trains a base voxel grid on the first frame and learns per-frame deltas to adapt the model for subsequent frames, using narrow-band tuning around the previous surface and a diff-based compression scheme to drastically reduce storage. A pilot-model guidance strategy further accelerates optimization and improves stability, enabling per-frame training around 15 seconds and rendering at 1k resolution in about 120 ms, with large speedups over state-of-the-art implicit methods. The approach demonstrates strong performance on Meet Room and N3DV datasets with substantial storage savings, opening possibilities for online, on-the-fly 3D video synthesis in practical settings.

Abstract

We present an explicit-grid based method for efficiently reconstructing streaming radiance fields for novel view synthesis of real world dynamic scenes. Instead of training a single model that combines all the frames, we formulate the dynamic modeling problem with an incremental learning paradigm in which per-frame model difference is trained to complement the adaption of a base model on the current frame. By exploiting the simple yet effective tuning strategy with narrow bands, the proposed method realizes a feasible framework for handling video sequences on-the-fly with high training efficiency. The storage overhead induced by using explicit grid representations can be significantly reduced through the use of model difference based compression. We also introduce an efficient strategy to further accelerate model optimization for each frame. Experiments on challenging video sequences demonstrate that our approach is capable of achieving a training speed of 15 seconds per-frame with competitive rendering quality, which attains $1000 \times$ speedup over the state-of-the-art implicit methods. Code is available at https://github.com/AlgoHunt/StreamRF.

Streaming Radiance Fields for 3D Video Synthesis

TL;DR

The paper tackles online 3D video synthesis for dynamic scenes by introducing StreamRF, an incremental, explicit-grid radiance-field framework. It trains a base voxel grid on the first frame and learns per-frame deltas to adapt the model for subsequent frames, using narrow-band tuning around the previous surface and a diff-based compression scheme to drastically reduce storage. A pilot-model guidance strategy further accelerates optimization and improves stability, enabling per-frame training around 15 seconds and rendering at 1k resolution in about 120 ms, with large speedups over state-of-the-art implicit methods. The approach demonstrates strong performance on Meet Room and N3DV datasets with substantial storage savings, opening possibilities for online, on-the-fly 3D video synthesis in practical settings.

Abstract

We present an explicit-grid based method for efficiently reconstructing streaming radiance fields for novel view synthesis of real world dynamic scenes. Instead of training a single model that combines all the frames, we formulate the dynamic modeling problem with an incremental learning paradigm in which per-frame model difference is trained to complement the adaption of a base model on the current frame. By exploiting the simple yet effective tuning strategy with narrow bands, the proposed method realizes a feasible framework for handling video sequences on-the-fly with high training efficiency. The storage overhead induced by using explicit grid representations can be significantly reduced through the use of model difference based compression. We also introduce an efficient strategy to further accelerate model optimization for each frame. Experiments on challenging video sequences demonstrate that our approach is capable of achieving a training speed of 15 seconds per-frame with competitive rendering quality, which attains speedup over the state-of-the-art implicit methods. Code is available at https://github.com/AlgoHunt/StreamRF.
Paper Structure (17 sections, 8 equations, 9 figures, 3 tables)

This paper contains 17 sections, 8 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Rendering results on test view.Top: Our Meet Room dataset; Bottom: N3DV dataset. These novel view results are rendered at interactive speed ($\sim$10 FPS) by StreamRF.
  • Figure 2: Left: PSNR comparison between tuning with (1) original sparse grid (2) dense grid and (3) our narrow band finetune in a 300 frames sequence; Right: visual comparison of above methods on (a) $40th$ frame (b) $240th$ frame. All results are trained with the same initial model.
  • Figure 3: Ablation study of diff-based compression. We compare per frame PSNR (left) and storage size (right) with diff-based compression enabled and disabled. The storage space drops to 0.5% of the original with a negligible difference on PSNR (decrease $0.156$ in average)
  • Figure 4: Ablation study of pilot model guidance: we compare Left: StreamRF trained without pilot model guidance and Right: with pilot model guidance in both N3DV dataset (top) and Meet Room dataset (bottom). These results reflect that pilot model can help reduce artifacts in both static background and dynamic foreground.
  • Figure 5: Failure Cases. There are visual artifacts in reconstructing high-frequency details and transparent/translucent objects.
  • ...and 4 more figures