Table of Contents
Fetching ...

TRiGS: Temporal Rigid-Body Motion for Scalable 4D Gaussian Splatting

Suwoong Yeom, Joonsik Nam, Seunggyu Choi, Lucas Yunkyu Lee, Sangmin Kim, Jaesik Park, Joonsoo Kim, Kugjin Yun, Kyeongbo Kong, Sukju Kang

Abstract

Recent 4D Gaussian Splatting (4DGS) methods achieve impressive dynamic scene reconstruction but often rely on piecewise linear velocity approximations and short temporal windows. This disjointed modeling leads to severe temporal fragmentation, forcing primitives to be repeatedly eliminated and regenerated to track complex nonlinear dynamics. This makeshift approximation eliminates the long-term temporal identity of objects and causes an inevitable proliferation of Gaussians, hindering scalability to extended video sequences. To address this, we propose TRiGS, a novel 4D representation that utilizes unified, continuous geometric transformations. By integrating $SE(3)$ transformations, hierarchical Bezier residuals, and learnable local anchors, TRiGS models geometrically consistent rigid motions for individual primitives. This continuous formulation preserves temporal identity and effectively mitigates unbounded memory growth. Extensive experiments demonstrate that TRiGS achieves high fidelity rendering on standard benchmarks while uniquely scaling to extended video sequences (e.g., 600 to 1200 frames) without severe memory bottlenecks, significantly outperforming prior works in temporal stability.

TRiGS: Temporal Rigid-Body Motion for Scalable 4D Gaussian Splatting

Abstract

Recent 4D Gaussian Splatting (4DGS) methods achieve impressive dynamic scene reconstruction but often rely on piecewise linear velocity approximations and short temporal windows. This disjointed modeling leads to severe temporal fragmentation, forcing primitives to be repeatedly eliminated and regenerated to track complex nonlinear dynamics. This makeshift approximation eliminates the long-term temporal identity of objects and causes an inevitable proliferation of Gaussians, hindering scalability to extended video sequences. To address this, we propose TRiGS, a novel 4D representation that utilizes unified, continuous geometric transformations. By integrating transformations, hierarchical Bezier residuals, and learnable local anchors, TRiGS models geometrically consistent rigid motions for individual primitives. This continuous formulation preserves temporal identity and effectively mitigates unbounded memory growth. Extensive experiments demonstrate that TRiGS achieves high fidelity rendering on standard benchmarks while uniquely scaling to extended video sequences (e.g., 600 to 1200 frames) without severe memory bottlenecks, significantly outperforming prior works in temporal stability.

Paper Structure

This paper contains 39 sections, 46 equations, 8 figures, 10 tables.

Figures (8)

  • Figure 1: Comparison of temporal modeling for extended dynamic scenes.(Left) While previous methods like FTGS rely on fragmented piecewise linear approximations that force primitives to repeatedly reset, our continuous rigid-body transformations preserve the long-term temporal identity of active Gaussians. (Right) By eliminating the need for unnecessary proliferation, TRiGS maintains a strictly constant, compact memory footprint while sustaining high rendering quality across extended sequences (up to 1200 frames), in stark contrast to baselines that suffer from severe memory bloat and performance drops. (Bottom) Consequently, our approach delivers temporally stable, high-fidelity novel view synthesis without visual degradation or motion artifacts.
  • Figure 2: Overview of our framework. We comprehensively model the scene dynamics through three key components. First, the motion is parameterized via a hierarchical decomposition using Bézier residuals within an effective temporal window. Second, this Lie algebra representation is mapped to a coupled $SE(3)$ transformation. Third, the transformation is applied relative to a gauge-fixed local anchor ($a_{i,\perp}$) to ensure stable, articulate deformation. Finally, the deformed primitives are rendered to optimize both photometric and motion regularization objectives.
  • Figure 3: Qualitative comparison on the SelfCap 1200-frame scenes.
  • Figure 4: Qualitative results on the N3V dataset.
  • Figure 5: Visual ablation on the SelfCap bike2 scene. Compared to incomplete configurations that suffer from motion artifacts and blurriness, Full TRiGS successfully preserves sharp geometric details and structural integrity.
  • ...and 3 more figures