Table of Contents
Fetching ...

HAIF-GS: Hierarchical and Induced Flow-Guided Gaussian Splatting for Dynamic Scene

Jianing Chen, Zehao Li, Yujun Cai, Hao Jiang, Chengxuan Qian, Juyuan Kang, Shuqin Gao, Honglong Zhao, Tianlu Mao, Yucheng Zhang

TL;DR

HAIF-GS, a unified framework that enables structured and consistent dynamic modeling through sparse anchor-driven deformation, significantly outperforms prior dynamic 3DGS methods in rendering quality, temporal coherence, and reconstruction efficiency.

Abstract

Reconstructing dynamic 3D scenes from monocular videos remains a fundamental challenge in 3D vision. While 3D Gaussian Splatting (3DGS) achieves real-time rendering in static settings, extending it to dynamic scenes is challenging due to the difficulty of learning structured and temporally consistent motion representations. This challenge often manifests as three limitations in existing methods: redundant Gaussian updates, insufficient motion supervision, and weak modeling of complex non-rigid deformations. These issues collectively hinder coherent and efficient dynamic reconstruction. To address these limitations, we propose HAIF-GS, a unified framework that enables structured and consistent dynamic modeling through sparse anchor-driven deformation. It first identifies motion-relevant regions via an Anchor Filter to suppress redundant updates in static areas. A self-supervised Induced Flow-Guided Deformation module induces anchor motion using multi-frame feature aggregation, eliminating the need for explicit flow labels. To further handle fine-grained deformations, a Hierarchical Anchor Propagation mechanism increases anchor resolution based on motion complexity and propagates multi-level transformations. Extensive experiments on synthetic and real-world benchmarks validate that HAIF-GS significantly outperforms prior dynamic 3DGS methods in rendering quality, temporal coherence, and reconstruction efficiency.

HAIF-GS: Hierarchical and Induced Flow-Guided Gaussian Splatting for Dynamic Scene

TL;DR

HAIF-GS, a unified framework that enables structured and consistent dynamic modeling through sparse anchor-driven deformation, significantly outperforms prior dynamic 3DGS methods in rendering quality, temporal coherence, and reconstruction efficiency.

Abstract

Reconstructing dynamic 3D scenes from monocular videos remains a fundamental challenge in 3D vision. While 3D Gaussian Splatting (3DGS) achieves real-time rendering in static settings, extending it to dynamic scenes is challenging due to the difficulty of learning structured and temporally consistent motion representations. This challenge often manifests as three limitations in existing methods: redundant Gaussian updates, insufficient motion supervision, and weak modeling of complex non-rigid deformations. These issues collectively hinder coherent and efficient dynamic reconstruction. To address these limitations, we propose HAIF-GS, a unified framework that enables structured and consistent dynamic modeling through sparse anchor-driven deformation. It first identifies motion-relevant regions via an Anchor Filter to suppress redundant updates in static areas. A self-supervised Induced Flow-Guided Deformation module induces anchor motion using multi-frame feature aggregation, eliminating the need for explicit flow labels. To further handle fine-grained deformations, a Hierarchical Anchor Propagation mechanism increases anchor resolution based on motion complexity and propagates multi-level transformations. Extensive experiments on synthetic and real-world benchmarks validate that HAIF-GS significantly outperforms prior dynamic 3DGS methods in rendering quality, temporal coherence, and reconstruction efficiency.

Paper Structure

This paper contains 30 sections, 12 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: The visualization results on (a) NeRF-DS nerf-ds dataset and (b) D-NeRF dnerf dataset.
  • Figure 2: The overview of our HAIF-GS. Given the canonical Gaussians, we first initialize sparse motion anchors and filter them using a confidence-aware Anchor Filter. We then aggregate multi-frame features to predict temporally consistent transformations for each anchor. In regions with complex motion, we hierarchically densify anchors and propagate deformations across layers. Finally, we update Gaussian parameters via anchor-based interpolation and render images for supervision.
  • Figure 3: Qualitative comparison on the NeRF-DS dataset nerf-ds. Compared with other SOTA methods,our method reconstructs finer details and produces a structured rendering of the moving objects.
  • Figure 3: Ablations on the key components of our proposed framework on NeRF-DS dataset nerf-ds.
  • Figure 4: Qualitative comparison on the D-NeRF dataset dnerf.
  • ...and 1 more figures