Table of Contents
Fetching ...

DRSM: efficient neural 4d decomposition for dynamic reconstruction in stationary monocular cameras

Weixing Xie, Xiao Dong, Yong Yang, Qiqin Lin, Jingze Chen, Junfeng Yao, Xiaohu Guo

TL;DR

The paper tackles dynamic scene reconstruction from a single stationary monocular video, an ill-posed problem due to limited geometry and occlusion. It introduces DRSM, a neural 4D decomposition that decouples static and dynamic content via planar factorization into six tri-planes, enabling efficient feature fusion for $v=(x,y,z,t)$ and NeRF-style rendering. Key contributions include the planar static/dynamic decomposition, depth-based motion constraints, and an occlusion/dynamics-aware ISDM sampling strategy, all validated by quantitative and ablation results showing improved fidelity and training efficiency. This approach facilitates realistic rendering and potential editing of dynamic scenes from monocular footage, with practical impact on video synthesis and analysis.

Abstract

With the popularity of monocular videos generated by video sharing and live broadcasting applications, reconstructing and editing dynamic scenes in stationary monocular cameras has become a special but anticipated technology. In contrast to scene reconstructions that exploit multi-view observations, the problem of modeling a dynamic scene from a single view is significantly more under-constrained and ill-posed. Inspired by recent progress in neural rendering, we present a novel framework to tackle 4D decomposition problem for dynamic scenes in monocular cameras. Our framework utilizes decomposed static and dynamic feature planes to represent 4D scenes and emphasizes the learning of dynamic regions through dense ray casting. Inadequate 3D clues from a single-view and occlusion are also particular challenges in scene reconstruction. To overcome these difficulties, we propose deep supervised optimization and ray casting strategies. With experiments on various videos, our method generates higher-fidelity results than existing methods for single-view dynamic scene representation.

DRSM: efficient neural 4d decomposition for dynamic reconstruction in stationary monocular cameras

TL;DR

The paper tackles dynamic scene reconstruction from a single stationary monocular video, an ill-posed problem due to limited geometry and occlusion. It introduces DRSM, a neural 4D decomposition that decouples static and dynamic content via planar factorization into six tri-planes, enabling efficient feature fusion for and NeRF-style rendering. Key contributions include the planar static/dynamic decomposition, depth-based motion constraints, and an occlusion/dynamics-aware ISDM sampling strategy, all validated by quantitative and ablation results showing improved fidelity and training efficiency. This approach facilitates realistic rendering and potential editing of dynamic scenes from monocular footage, with practical impact on video synthesis and analysis.

Abstract

With the popularity of monocular videos generated by video sharing and live broadcasting applications, reconstructing and editing dynamic scenes in stationary monocular cameras has become a special but anticipated technology. In contrast to scene reconstructions that exploit multi-view observations, the problem of modeling a dynamic scene from a single view is significantly more under-constrained and ill-posed. Inspired by recent progress in neural rendering, we present a novel framework to tackle 4D decomposition problem for dynamic scenes in monocular cameras. Our framework utilizes decomposed static and dynamic feature planes to represent 4D scenes and emphasizes the learning of dynamic regions through dense ray casting. Inadequate 3D clues from a single-view and occlusion are also particular challenges in scene reconstruction. To overcome these difficulties, we propose deep supervised optimization and ray casting strategies. With experiments on various videos, our method generates higher-fidelity results than existing methods for single-view dynamic scene representation.
Paper Structure (8 sections, 12 equations, 3 figures, 1 table)

This paper contains 8 sections, 12 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Framework of the proposed DRSM.
  • Figure 2: Comparison of DRSM and other methods on dynamic reconstruction results. We remove the hand in video and show PSNR metric of each method.
  • Figure 3: Ablation study on a marionette dancing video. We remove manipulating wires and show the reconstructed point clouds.