Table of Contents
Fetching ...

MoRel: Long-Range Flicker-Free 4D Motion Modeling via Anchor Relay-based Bidirectional Blending with Hierarchical Densification

Sangwoon Kwak, Weeyoung Kwon, Jun Young Jeong, Geonho Kim, Won-Sik Cheong, Jihyong Oh

TL;DR

MoRel tackles the memory and temporal coherence challenges of long-range 4D motion modeling by introducing an anchor-based representation complemented by Anchor Relay–based Bidirectional Blending (ARBB) and Feature-variance-guided Hierarchical Densification (FHD). The method organizes global and local canonical spaces via a Global Canonical Anchor (GCA) and periodically placed Key-frame Anchors (KfA), and uses Progressive Windowed Deformation (PWD) with a learnable temporal opacity in Intermediate Frame Blending (IFB) to achieve smooth, flicker-free transitions while keeping memory usage bounded. A new SelfCap_LR dataset enables robust evaluation of long-range motion and temporal coherence, with MoRel outperforming prior all-at-once and chunk-based approaches in both quality and efficiency. The results demonstrate MoRel’s practical potential for real-world, long-duration dynamic scenes with scalable, on-demand loading and reduced computational demands.

Abstract

Recent advances in 4D Gaussian Splatting (4DGS) have extended the high-speed rendering capability of 3D Gaussian Splatting (3DGS) into the temporal domain, enabling real-time rendering of dynamic scenes. However, one of the major remaining challenges lies in modeling long-range motion-contained dynamic videos, where a naive extension of existing methods leads to severe memory explosion, temporal flickering, and failure to handle appearing or disappearing occlusions over time. To address these challenges, we propose a novel 4DGS framework characterized by an Anchor Relay-based Bidirectional Blending (ARBB) mechanism, named MoRel, which enables temporally consistent and memory-efficient modeling of long-range dynamic scenes. Our method progressively constructs locally canonical anchor spaces at key-frame time index and models inter-frame deformations at the anchor level, enhancing temporal coherence. By learning bidirectional deformations between KfA and adaptively blending them through learnable opacity control, our approach mitigates temporal discontinuities and flickering artifacts. We further introduce a Feature-variance-guided Hierarchical Densification (FHD) scheme that effectively densifies KfA's while keeping rendering quality, based on an assigned level of feature-variance. To effectively evaluate our model's capability to handle real-world long-range 4D motion, we newly compose long-range 4D motion-contained dataset, called SelfCap$_{\text{LR}}$. It has larger average dynamic motion magnitude, captured at spatially wider spaces, compared to previous dynamic video datasets. Overall, our MoRel achieves temporally coherent and flicker-free long-range 4D reconstruction while maintaining bounded memory usage, demonstrating both scalability and efficiency in dynamic Gaussian-based representations.

MoRel: Long-Range Flicker-Free 4D Motion Modeling via Anchor Relay-based Bidirectional Blending with Hierarchical Densification

TL;DR

MoRel tackles the memory and temporal coherence challenges of long-range 4D motion modeling by introducing an anchor-based representation complemented by Anchor Relay–based Bidirectional Blending (ARBB) and Feature-variance-guided Hierarchical Densification (FHD). The method organizes global and local canonical spaces via a Global Canonical Anchor (GCA) and periodically placed Key-frame Anchors (KfA), and uses Progressive Windowed Deformation (PWD) with a learnable temporal opacity in Intermediate Frame Blending (IFB) to achieve smooth, flicker-free transitions while keeping memory usage bounded. A new SelfCap_LR dataset enables robust evaluation of long-range motion and temporal coherence, with MoRel outperforming prior all-at-once and chunk-based approaches in both quality and efficiency. The results demonstrate MoRel’s practical potential for real-world, long-duration dynamic scenes with scalable, on-demand loading and reduced computational demands.

Abstract

Recent advances in 4D Gaussian Splatting (4DGS) have extended the high-speed rendering capability of 3D Gaussian Splatting (3DGS) into the temporal domain, enabling real-time rendering of dynamic scenes. However, one of the major remaining challenges lies in modeling long-range motion-contained dynamic videos, where a naive extension of existing methods leads to severe memory explosion, temporal flickering, and failure to handle appearing or disappearing occlusions over time. To address these challenges, we propose a novel 4DGS framework characterized by an Anchor Relay-based Bidirectional Blending (ARBB) mechanism, named MoRel, which enables temporally consistent and memory-efficient modeling of long-range dynamic scenes. Our method progressively constructs locally canonical anchor spaces at key-frame time index and models inter-frame deformations at the anchor level, enhancing temporal coherence. By learning bidirectional deformations between KfA and adaptively blending them through learnable opacity control, our approach mitigates temporal discontinuities and flickering artifacts. We further introduce a Feature-variance-guided Hierarchical Densification (FHD) scheme that effectively densifies KfA's while keeping rendering quality, based on an assigned level of feature-variance. To effectively evaluate our model's capability to handle real-world long-range 4D motion, we newly compose long-range 4D motion-contained dataset, called SelfCap. It has larger average dynamic motion magnitude, captured at spatially wider spaces, compared to previous dynamic video datasets. Overall, our MoRel achieves temporally coherent and flicker-free long-range 4D reconstruction while maintaining bounded memory usage, demonstrating both scalability and efficiency in dynamic Gaussian-based representations.

Paper Structure

This paper contains 31 sections, 4 equations, 12 figures, 6 tables, 2 algorithms.

Figures (12)

  • Figure 1: Approaches for modeling long-range 4D Motion. (a) The all-at-once training experiences memory overflow and even suffers from limited representational capacity. (b) The chunk-based training mitigates the memory overflow but causes temporal flickering at chunk boundaries, substantially degrading visual quality. In contrast, (c) our Anchor Relay-based Bidirectional Blending (ARBB) approach successfully maintains both representation quality and temporal consistency by smoothly transiting the influence of each Key-frame Anchor (KfA). The rendered patches, frame-wise tOF chu2018temporally, and temporal profile provide strong evidence for the effectiveness of our method.
  • Figure 2: Conceptual comparison of existing 4DGS methods in modeling long-range 4D motion. (a) All-at-once approaches suffer from high memory usage, while (b) chunk-based methods inevitably fail to maintain temporal consistency. Even advanced variants struggle with system applicability such as a random accessibility. Our ARBB framework resolves all these issues, achieving bounded memory and temporally coherent long-range modeling.
  • Figure 3: Overview of MoRel framework. To efficiently model long-range 4D motion with bounded memory and temporal consistency, MoRel adopts the Anchor Relay-based Bidirectional Blending (ARBB) strategy composed of four training stages which are organized into two phase. In the Anchor Relay phase (Sec. \ref{['sec:anchor relay']}), a GCA is first trained on entire frames with a single point cloud. Next, each KfA is derived around its key-frame time index, while its spatial detail is enhanced through FHD (Sec. \ref{['sec:hierarchical densification']}). In the Bidirectional Blending phase (Sec. \ref{['sec:bidirectional deformation']}), PWD training stage is executed to learn bidirectional deformation fields within local temporal windows to ensure robust motion modeling of each anchor. Finally, in IFB training stage, each pair of neighboring anchors are fused through a learnable temporal opacity control, that smoothly transitions anchor influence over time, eliminating temporal flickering across chunks.
  • Figure 4: Comparison of training strategies for modeling long-range 4D motion with bidirectional deformation. (a) All-at-once training suffers from memory overflow. (b) Chunk-wise training reduces memory cost but causes inter-chunk interference. (c) Our Bidirectional Blending (PWD + IFB) maintains bounded memory and prevents inter-chunk interference.
  • Figure 5: Overview of Feature-variance-guided Hierarchical Densification. (a) Variance-based Leveling: After GCA training, we assign a level to each anchor-point guided by the feature-variance. (b) Level-wise Densification: During the KfA and PWD trainings, gradients for KfA densification are modulated by level-specific weights, enabling early low-frequency stabilization and late high-frequency refinement.
  • ...and 7 more figures