Table of Contents
Fetching ...

ReFlow: Self-correction Motion Learning for Dynamic Scene Reconstruction

Yanzhe Liang, Ruijie Zhu, Hanzhi Chang, Zhuoyuan Li, Jiahao Lu, Tianzhu Zhang

Abstract

We present ReFlow, a unified framework for monocular dynamic scene reconstruction that learns 3D motion in a novel self-correction manner from raw video. Existing methods often suffer from incomplete scene initialization for dynamic regions, leading to unstable reconstruction and motion estimation, which often resorts to external dense motion guidance such as pre-computed optical flow to further stabilize and constrain the reconstruction of dynamic components. However, this introduces additional complexity and potential error propagation. To address these issues, ReFlow integrates a Complete Canonical Space Construction module for enhanced initialization of both static and dynamic regions, and a Separation-Based Dynamic Scene Modeling module that decouples static and dynamic components for targeted motion supervision. The core of ReFlow is a novel self-correction flow matching mechanism, consisting of Full Flow Matching to align 3D scene flow with time-varying 2D observations, and Camera Flow Matching to enforce multi-view consistency for static objects. Together, these modules enable robust and accurate dynamic scene reconstruction. Extensive experiments across diverse scenarios demonstrate that ReFlow achieves superior reconstruction quality and robustness, establishing a novel self-correction paradigm for monocular 4D reconstruction.

ReFlow: Self-correction Motion Learning for Dynamic Scene Reconstruction

Abstract

We present ReFlow, a unified framework for monocular dynamic scene reconstruction that learns 3D motion in a novel self-correction manner from raw video. Existing methods often suffer from incomplete scene initialization for dynamic regions, leading to unstable reconstruction and motion estimation, which often resorts to external dense motion guidance such as pre-computed optical flow to further stabilize and constrain the reconstruction of dynamic components. However, this introduces additional complexity and potential error propagation. To address these issues, ReFlow integrates a Complete Canonical Space Construction module for enhanced initialization of both static and dynamic regions, and a Separation-Based Dynamic Scene Modeling module that decouples static and dynamic components for targeted motion supervision. The core of ReFlow is a novel self-correction flow matching mechanism, consisting of Full Flow Matching to align 3D scene flow with time-varying 2D observations, and Camera Flow Matching to enforce multi-view consistency for static objects. Together, these modules enable robust and accurate dynamic scene reconstruction. Extensive experiments across diverse scenarios demonstrate that ReFlow achieves superior reconstruction quality and robustness, establishing a novel self-correction paradigm for monocular 4D reconstruction.

Paper Structure

This paper contains 35 sections, 29 equations, 13 figures, 9 tables.

Figures (13)

  • Figure 1: Typical Challenges in monocular dynamic scene reconstruction. Top: Incomplete initialization for dynamic regions: the initial 3D structure from SfM often misses dynamic components and initializes Gaussians without separating static points (green) from dynamic points (red), leading to an entangled and incomplete representation. Bottom: To compensate, existing methods frequently resort to external dense motion guidance to constrain and stabilize the reconstruction of dynamic regions.
  • Figure 2: Motivation of Self-correction Flow Matching. (a) We start with a simple observation: 2D observations, such as the shifting balloon, are caused by 3D motion. Accurate reconstructed 3D Motion should naturally align with these visible changes. (b) Unlike previous methods that use external motion priors to supervise 3D motion, we instead uses raw video as motion supervision through a self-correction flow matching mechanism to directly align predicted 3D motion projections with 2D frame differences.
  • Figure 3: Overview of ReFlow. We start by constructing a complete canonical space(Sec. \ref{['sec:canonical']}), which includes both static and dynamic components, ensuring a reliable 3D scene initialization. Next, we disentangle these elements using spatial and spatiotemporal feature planes(Sec. \ref{['sec:modeling']}), providing a structured representation that separately handles static and dynamic regions. This preparation allows us to introduce targeted motion constraints(Sec. \ref{['sec:flowmatching']}): Full Flow supervises motion across the entire scene, while Camera Flow enforces consistency in static regions, enabling the self-correction learning mechanism for accurate 3D motion reconstruction.
  • Figure 4: Self-correction flow matching mechanism. (a) Different Motion and Flow in the 4D Scene. Static areas move only due to camera motion (camera flow), while dynamic areas involve both camera and object motion (full flow). Accurate motion learning requires region-specific flow supervision. (b) Self-correction flow matching. We apply full flow to warp the entire image from state $t_{1}$ to state $t_{2}$ and compare with the real observation, validating overall motion. Camera flow is used similarly but only on static regions, ensuring their stability. Together, these provide a complementary self-correction signal for 3D motion learning.
  • Figure 5: Qualitative comparison on Nvidia Monocular dataset gao2021dynamicviewsynthesisdynamic. Yellow boxes highlight zoomed-in regions for detail examination. Per-scene average PSNR values are provided.
  • ...and 8 more figures